Introducing Provenance Capture into a Legacy Data System

TitleIntroducing Provenance Capture into a Legacy Data System
Publication TypeJournal Article
Year of Publication2013
AuthorsConover, H, Ramachandran, R, Beaumont, B, Kulkarni, A, McEniry, M, Regner, K, Graves, S
JournalIEEE Transactions on Geoscience and Remote Sensing
Date Published11/2013
ISSN Number0196-2892
KeywordsBrowsers, Communities, Context, Data management, data processing, Data systems, Geoscience, geospatial data, metadata standards, provenance, science data systems, Software, standards
AbstractAccurate provenance information facilitates improved understanding of Earth science data and scientific reproducibility and can serve as an indicator of data quality. Provenance capture is an integral part of many modern workflow systems but may not have been considered in the design of legacy data production systems. Furthermore, in addition to data lineage, it is also important to capture contextual information needed for understanding how a data set was produced. This paper describes our experience in retrofitting a legacy data system to support capture, storage, and dissemination of provenance. Data inputs and transformations are logged automatically, while broader context information describing science algorithms and ancillary files is manually compiled. Provenance and context information are integrated for interactive user access and embedded into data files as XML documents compliant with the “Lineage” specification for geographic metadata defined by the International Organization for Standardization in the ISO 19115-2 standard. Lessons learned from this approach can inform others who need to incorporate provenance into a data system after the fact.