Main design and functionality
- Edited by: T. Hengl and H.I. Reuter
WorldGrids.org is a component of GSIF (Global Soil Information Facilities) which forms a major component of ISRIC's mandate to serve the international community by preserving soil data and delivering value-added products with global coverage in an open access format. The GSIF framework for soil data accessibility is largely inspired by Open Access initiatives for freely sharing environmental data sets (see e.g. Kleiner, 2011). The primary motivation for the construction of GSIF is the fundamental and pragmatic need for a comprehensive cyber-infrastructure to house collated legacy (historical) soil data, some of which is currently under threat of being lost forever. A secondary aim of GSIF is to further support the increasing awareness and understanding of the role of soils within major global issues by both providing, and supporting the generation of, full coverage soil information layers at various resolutions. Such value added products are typically generated from legacy databases by combination with other publicly available covariate layers related to soil - such as elevation models, vegetation, landuse and geology - to generate information layers suitable to feed into larger environmental and climate models (for instance).
To enable this secondary aim of provisioning and creation support, the WorldGrids.org data portal was launched in 2012 with the express purpose of collecting, storing, accessing and interacting with gridded (continuous) covariate data of various types. The WorldGrids.org portal will be used to serve gridded maps to users and contains tools for overlay and sub setting operations on the grids available on our servers (Web Processing Service). In order to provide further support to Digital Soil Mapping teams around the world, access to standardized global soil covariate layers derived, for example, from SRTM DEM, MODIS Terra products, various climatic, land cover and geological maps will also be provided.
WorldGrids.org is designed as a (de)-centralized repository, meaning that ISRIC supports and encourages submissions of new data to the repository via the network of users. Functionality and list of layers available on WorldGrids.org for production mapping will be continuously revised and extended.
- Access to thematic images at various resolutions (with main focus on the 1 km resolution global grids);
- Web Processing Service (WPS) that allows you to overlay, subset and aggregate grids;
- Instructions on how to install required software and load the grids into analysis;
- Complete metadata and previews of all grids;
Existing data portals for environmental covariates
At global scale, two different strategies of data serving, common for many environmental data, can be distinguished. One strategy is to catalog large numbers of diverse data sets from various countries, resulting in data repositories that are thematically rich but inherently fragmented. The alternative strategy is to compile thematically smaller yet higher quality data sets, which are internally harmonized and externally consistent. At the global level, for example, the United Nations began with the UNEP Global Resource Information Database (GRID) initiative back in 1985. The objective of GRID was to provide data and information for decision and policy making. Likewise, the Global Change Master directory (GCMD) from NASA contains over 25,000 metadata records which are accessible via the internet. At the continental level, various similar initiatives exist that serve even finer resolution data. In Europe, an initiative called INSPIRE was set to serve European environmental data. Large amounts of environmental data are now publicly available for the African continent, e.g. via the AfSIS — Africa soil information system. At country level there are even more such initiatives e.g. the Australian SDI (Busby & Kelly, 2004) or the Geospatial One Stop (GOS) in the US.
Unfortunately, continent level map repositories often can not be merged to produce global coverage layers. As Maguire & Longley (2005) conclude: “Coordination of parallel initiatives must reconcile different technology standards, administrative schema and funding regimes. Not surprisingly… bottom-up approaches to building inter-organizational enterprise GIS have met with limited success.” While looking at these numerous data portals we can observe some disadvantages: (1) numerous datasets with different access and user rights exist, (2) data types, coordinate systems and resolutions differ in nonstandard ways across all data sets, (3) methods of generating soil maps differ, (4) data sets are most often fragmented in both time and space. In many cases production processes have been closed or non-documented, which makes it challenging to reproduce the provided data sets.
The WorldGrids.org initiative will test the implementation of a (de)-central repository for collecting, storing, accessing and interacting with gridded data sets of global soil covariate data for internal and external production mapping.
Overview of Data Sets available via WorldGrids.org
WorldGrids.org aims at serving data sets in the following thematic fields (see Fig. below): (a) Geomorphometry i.e. Digital Elevation Models and derived land surface parameters and objects, (b) spectral and multispectral remote sensing imagery and derived parameters, (c) climatic and meteorological covariates, (d) land cover/ land use information, and (e) expert-based covariates, parent material and soil-unit maps (including soil delineations and catena models). Each of these thematic groups constitutes a separate sub-product and therefore requires a collection and processing strategy tailored to each thematic group.
- See also: a list of layers available
WorldGrids.org follows some basic principles which are common to other data portal standards such as the MODIS data services. To enable the cohesion across themes, soil covariates submitted to WorldGrids.org must adhere to the following minimum requirements:
- conformation to at least one standard resolution/s (e.g. 100 m, 250 m, 1 km, 2.5 km and/or 5.6 km),
- consist of global coverage with less than 5% of missing pixels for the domain of interest (land mask),
- generated via reproducible production steps,
- contain complete Metadata in accompaniment to the original data,
- allow public access via WorldGrids.org, e.g. registered under a Creative Commons license.
- Read more: submission instructions
WorldGrids products and standard methods used
Each dataset (WorldGrids product) available at WorldGrids.org comes with accompanying metadata file, visualization specification (legend) and a processing script. The processing scripts are written for the R-project (extension *.R) or in Python (extension *.py) and might make use of several other packages. All processing scripts can be obtained via the Google code project.
WorldGrids.org soil covariates are derived out of various sources of input data, which are often unprocessed, in-complete or not fit to be used for global soil mapping. The above figure provides an overview of the main data sources used to produce the global coverage soil covariates. Note that most of the covariates available at WorldGrids.org are original products derived by the WorldGrids.org developers. Nevertheless, some 20-30% of maps listed in the repository are simply reformatted/reprojected version of the original grids, hence we strongly recommend that you always refer to / attribute the original data sources.
The most typical procedures used to derived soil covariates are (1) principal component analysis for time-series of images, (2) resampling the indicator values to memberships and (3) aggregating, generalizing / resampling the maps with too much detail. The quantity and complexity of the input imagery is often high and the processing can take significant resources. We make frequent use of the techniques such as the principal component analysis or resampling, especially to process the MODIS-derived products. For example the productive soil mask map shown below was derived from a long term time series of Leaf Area Index images (120 of monthly images).
Eastman and Fulk (1993) have observed that principal component analysis is an attractive technique to analyze time-series of images and reduce their dimensionality. The first Principal Component (PC) of a long term monthly time-series of MODIS EVI images, for example, represent clearly the mean biomass, whilst the following PCs show vegetation seasonality, different land use practices, vegetation succession and degradation processes.
Note also that WorldGrids.org follows a standardized naming convention and data format for all layers where each submitted layer, its metadata and Styled Layer Descriptors (SLD) files need to have the same name; names must be unique between layers and no more than eight characters in length. The first three letter designate the variable type: e.g. TD1 (daily surface temperature PC1), the next three letters represent the data source or collection method: MDD (MODIS day time imagery), the 7th character is the effective scale coded from 1 to 7 as seen in Table 1, the 8th letter is the product version number (a for v1, b for v2, c for v3).
Table 1. Standard resolutions and corresponding scale numbers.
|Standard Resolutions||arc degrees||~kilometers|
Each dataset is described in ISRICs Metadata system under the catalog Worldgrids with a full ISO19139 record. ISO19139 is the XML implementation schema for ISO19115 specifying for example details like data type, abstract, point of contact, descriptive keywords, temporal and spatial extent, links to processing description, data download, data quality.
- See an example of the metadata file
Methods for the portal
We have identified five generic questions which will be asked against the WorldGrids.org in a production mapping context. First, we need to provide a list of available data sources which are available at WorldGrids.org (Method I). Secondly, we need to identify the information at a given location where the simplest case would be extraction of the elevation at a given single profile location (Method II). Similarly, if we have data from a large sampling campaign we need to extract covariates for n number of points (Method III). In other cases where we are building information for a polygon based mapping approach an aggregate function allows the extraction of statistical information (mean, min, max, standard deviation, etc.) for given regions (e.g. soil polygons, watersheds) (Method IV). One of the main applications of WorldGrids will be for use in geostatistical mapping. When a (geo)statistical model is built the significant covariates are determined which indicates which data subsets require download from WorldGrids.org as well (Method V).
The above methods have been implemented via the pyWPS (OGC compliant) Open Source Web Processing Service (WPS) framework (Čepický et.al, 2012). Here we basically extend the methods available in the following packages: GDAL, Numpy and Scipy.
The success of data distribution strongly depends on the simplicity of its access. Therefore, we first introduce the simplest methods available via a browser. To improve ease of access, we have sublimated the complexity of the processes as executed in the browser under wrapper functions developed for different WPS functions (see GSIF package for R). In practice, we expect that most of the users will automate querying and operations through scripting or through some Graphical User Interface.
Accessing the data in Browser
The simplest way to perform a given Method is to execute it from within a browser which will display of the resulting XML file. A list of generic WPS functions (Methods) and input parameters is available in Table 2.
Table 2. Command Settings for the different Methods
|II (1pt)||sampler_local1pt_nogml||x, y, inRastername|
|III (many pt)||sampler_local||inGML, inRastername|
|IV (zonalstats)||overlay||inZone, inRastername, stype|
|V (subset)||subset||bbox, inRastername|
Where inRastername corresponds to the chosen Worldgrids.org dataset, x and y to the respective point location in geographic coordinates, inGML specifies the URL of a GML file of multiple point locations stored somewhere, stype the type of statistics reported (one of the following: sum, minimum, mean, maximum, sd (e.g. standard deviation) and variance), and bbox the bounding box with a lower left and the upper right hand corner specified in geographic coordinates.
For Method (II) to receive the value from the raster biocl15 (see Datainput: inRastername) at the given location 11.3E (Datainput x) and 12.1N (Datainput y) we would use:
http://wps.worldgrids.org/pywps.cgi?service=wps &version=1.0.0 &request=execute &identifier=sampler_local1pt_nogml &datainputs=[x=11.3;y=12.1;inRastername=biocl15]
The output from the above call would be the following:
Where reference to a downloadable file is desired instead of the standard XML file display in the browser, we would add the following at the end:
Similar calls would be executed for any of the other methods where the “identifier” for the EXECUTE parameter and the “data inputs” are changed accordingly.
Accessing the data in ArcGIS/ Python
As the majority of professional GIS users are well educated with the ArcGIS suite of products, they are more likely to apply this product than any browser functionality. The GSIF toolbox for accessing the Worldgrids.org functionality can be found at the GlobalSoilMap.net homepage. Whilst the Graphical User Interface shown in the figure below is an example of a frontend, the underlying functionality is written in Python and so therefore is easily adapted to any other GIS system (GRASS, QGIS, TNT, Microstation).
WorldGrids can be also accessed via the GSIF package for R. See some examples.
- Read more: Python scripts
New Data submission
Everyone is welcome to contribute to, or suggest, global data sets for the WorldGrids.org repository. The preferred data format is GeoTIFF and the target scale is 1 km (5 km layers are also welcome). Each potential new layer must also be accompanied with a metadata and processing description as outlined previously. A new layer can be submitted by sending an email with data details, metadata records, processing scripts, data location to:
- E-mail: email@example.com
If the submitted data set satisfies the minimum requirements and passes a quality control/assessment process, you will be notified when the data is introduced into the system.
- Read more: submission instructions
To ensure ease of compatibility of all gridded products, each continental and national node could focus on producing a consistent minimum subset of finer resolution covariates, following the same spatial reference (grid system) and data format as outlined above. To assist in compatibility and implementation, the authors have produced a 'cookbook' for regional server installation as well as documentation for all scripts used in data preparation. We estimate that for a continental node, given availability of IT infrastructure and trained personal, a period of one week should be sufficient to get the server established and running (read more: A cookbook to install WPS). Population of the server with requisite data is estimated to take between 6 to 12 months depending on the production aims and processing capabilities within the node. This time frame does not account for any maintenance issues on the server or uploading of data.
- Example of a regional node: AfricaSoils.net
Future developments will focus on the following points:
- Versioning system — WorldGrids.org is intended to run on a versioning system that allows a user to trace which data set/s have been used in various productions (this is currently done via the Google code tracker).
- File traffic tracking system — WorldGrids.org requires tracking of downloads so that resources can be dedicated to the most used layers.
- Easy export methods — expansion of formats for data export. At the moment we do support GeoTIFF for export, however further formats are possible. The question also remains open if any additional services like OpenDAP or THREDDS are required.
- Multilingual support — WorldGrids.org is aimed at international users, hence support for multilingual data search and interpretation is a logical extension.
- Full data processing automation — the processing steps should ideally offer complete automation so that the structure of the portal can extend to allow a seamless transition between core data sets and remote resources,
- Server stability optimization — appropriate resourcing will need to be determined to ensure the stability of the portal, its adequate performance under the expected access patterns and its scalability.
- Active and large user and developers community — the role of ISRIC is to support the coalescence of users and stakeholders into a participatory community. Such public data portals only have relevance if the community of active users grows and influences the development of the portal.
- Busby, J. R., & Kelly, P. 2004. Australian spatial data infra-structures. In Proceedings of the 7th international global spatial data infrastructure conference, Bangalore, India, February 2–4 (10 pp).
- Čepický, J. and PyWPS Development team, 2012. pyWPS – an OGC conforming WPS implementation in python. http://pywps.wald.intevation.org/ last accessed 29-02-2012
- Eastman, J.R., M. Fulk, 1993. Long sequence time series eval-uation using standardized principal components. Photo-grammetric Engineering and Remote Sensing, 59(6): 991-996.
- Kleiner, K., 2011. Data on demand. Nature Climate Change, 1(1): 10-12
- Maguire, D. and Longley, P. 2005. The emergence of geoportals and their role in spatial data infrastructures. Computers, Environment and Urban Systems, 29 (2005) 3–14