CONTACT  |  LOGIN  |  NEW USER?

Data Access Technical Discussion

AASG Project Partner Data Delivery to the NGDS

For the AASG geothermal data project, data will be considered delivered from project partners to the NGDS system when it is locatable using an NGDS catalog search, and accessible via the web according to procedures described in the metadata record obtained from the NGDS catalog.

The first step to make any information resource part of the system is to create a metadata record describing the resource and how it can be obtained.  For guidance on metadata preparation, see Metadata.

Access to a resource via the web may be file-based, via web-application, or service-based:

File-based approaches:

  • At the simplest level, a file-based representation of the resource in whatever format it currently exists is made available at some permanent web location (identified by a URL).
  • The information is loaded in a structured format such that the information can be processed using computer software, and made available as a file. To be useful, the metadata must fully describe the data structure, e.g. the tables and fields in each table are defined.
  • The information is presented using a documented, community data structure and made available as a file. In this case the metadata can point to the data model specification that is used.

Web applications

Information resources can be made available to users via web-browser based applications that allow users to browse, view, process or analyze, or download in various ways. Such approaches can provide useful functionality, but do not lend to interoperability or resource reuse, because the application functionality is typically tightly coupled to a particular data source. In such a case, application function can not easily be applied to other data sources, and the data can not be accessed  directly by other applications. Download function commonly allows some filtering and selection of data subsets, but these operations required direct human interaction. The acquired files may be structured or unstructured (see above).

In general the AASG Geothermal Data project does not support development of web applications for data access. It is our intention that structured data be provided through web services, and that applications developed for NGDS use the service interfaces to acquire data that they operate on. In this fashion the work of the data compiler, data hosting, and application developement agents in the system can be decoupled.

Examples:

Nevada Bureau of Mines and Geology Geothermal Web Application

Geothermal Prospector

EarthChem

System for Earth Sample Registration

Service-based approaches

In pursuit of an open data/linked data approach to making data available to the end users, our ultimate objective is to make all structured data available through web services using well documented community specifications for service protocols and data interchange formats.  This sort of data delivery requires more sophisticated client and server software stacks, and more rigid quality control of the content; because of the additional up front cost, only widely available and useful data are deemed of sufficient value to warrant web service delivery. Web services may be deployed using custom data schema corresponding to existing database holdings, but the goal is to facilitate use by data consumers by delivering data of a particular type in a consistent schema. The AASG project is currently using simple feature data interchange/exchange schema for service-based delivery--these are 'flat' file formats that can be represented as simple spreadsheets or text tables with no information loss. These are developed as content models independent of a particular implemenation, using Excel spreadsheets (see Content Models).

AASG Project Partners can implement  web services on their own servers (preferred approach), at one of the regional project hubs located at the Kentucky, Illinois, Arizona, and Nevada state geological surveys, or by arrangement with another agency.  Data delivered by NGDS services should conform to one of the NGDS content models, but this is negotiable in the annual statement of work process.  The anticipated delivery process must be defined in the partners Statement of Work and approved by the project management team before the main data compilation phase of a data development cycle (step 3 in Figure 1, in the Data Development Cycle).

Metadata

Metadata should be created and submitted to an NGDS catalog for any resource that is meant to be accessible individually via the web.

Individual documents require one metadata record per document. Some document types may consist of a bundle of files, e.g. ESRI shape file. In general these should be bundled into a single file like a zip archive or UNIX tar file. The metadata must include the URL at which the document can be accessed. These documents might be scans of well logs, scanned reports or publications, or data in a spreadsheet, such as an Excel file.

Datasets (data products) are typically considered as individual works (see FRBR), unified by the compilation activity that brings information together into a single data structure, editing and verifying content as necessary. This approach is based on our interest in facilitation data access by users, and the observation that such a user is first interested in a particular kind of data, and upon finding a fit-for-purpose dataset, will next want to know how to get the data.  From this perspective a dataset work (data product, resource) will have a single metadata record that may include specification of multiple distributions of the resource. For instance a borehole temperature dataset may be available with all the records in an Excel spreadsheet, or visualized through a web map service, or individual observations may be accessed through a web feature service (see for example  Montana Hot Springs). 

At a more granular level, individual records (features, objects) in a dataset may include source information, documenting details of observation or measurement procedure and other information specific to a particular data type. This might include information such as location, data and time of observations, and the source of the data. These feature-level metadata are delivered with the data, and only summarized in the work-level dataset metadata that are published to the NGDS-compliant catalog. This granularity issue can be difficult because of differing perspectives on what is data or metadata, differing granularity of documentation available, and different use-case priorities.

The required metadata content is explained in USGIN metadata recommendations. These requirements proscribe the content of the metadata, but not the delivery format. ISO19139 xml is the preferred encoding based on its expanding adoption in the community.  USGIN guidelines for implementing the content recommendations in ISO19139 XML are avaialble, as well as a detailed USGIN profile for interoperable metadata using ISO19139.  Individual metadata records for simple (single-distribution) can be created with the USGIN metadata Wizard; If files need to be uploaded to a repository (the contributor does not have a host they can use to make the files avaialble) the USGIN repository can be used.

FGDC xml is widely used and if participants already have workflow in place using this format and can provide the requested metadata content, this can be made to work. FGDC XML should be tested to validate against the official XML schema at http://www.fgdc.gov/schemas/metadata/fgdc-std-001-1998.xsd. Please confer with the AZGS developer team about metadata formatting to facilitate import of metadata into the NGDS catalog.

Content Models and XML schema

As a starting point to promote interoperability, AASG geothermal data to be provided using NGDS services will conform to content models that are presented as Excel workbooks listed in the Content Model Templates page. The content model defines the information that will be associated with a feature or observation type; the content model may be implemented in a variety of ways, but USGIN is currently implementing these interchange formats as GML Simple Features to be served by an OGC WFS.  If data to be served are not accounted for by an existing content model, network participants are invited to propose new models. A document with guidelines for construction of a template workbook in available here.

Development of content models during the first year of the project has been an organic process. The models have evolved rapidly as production scale data compilation has gotten under way.  WFS services have been deployed as the models evolve, and iteration of model versions and XML schema for corresponding WFS features has resulted in some discrepancies between the Excel Workbooks and corresponding XML documents.  During early September 2011, the AZGS team has reviewed and synchronized the deployed services and XML schema to produce a collection of schema that will validate the deployed services. XML schema for AASG geothermal data WFS features are being placed in a repository at http://schemas.usgin.org/schemas/.

XML schema are versioned, and the namespace for the schema elements is unique to that schema version. Thus namespace-aware client applications can determine if an instance document is using a known schema version.

X
Enter your State Geothermal Data username.
Enter the password that accompanies your username.
Loading