Applying the Karma Provenance tool to NASA’s AMSR-E data production stream

TitleApplying the Karma Provenance tool to NASA’s AMSR-E data production stream
Publication TypeConference Paper
Year of Publication2010
AuthorsRamachandran, R, Conover, H, Regner, K, Movva, S, Goodman, M, Plale, B, Purohit, P, Sun, Y
Conference NameAmerican Geophysical Union
Date Published12/2010
Conference LocationSan Francisco, CA
AbstractCurrent procedures for capturing and disseminating provenance, or data product lineage, are limited in both what is captured and how it is disseminated to the science community. For example, the Advanced Microwave Scanning Radiometer for the Earth Observing System (AMSR-E) Science Investigator-led Processing System (SIPS) generates Level 2 and Level 3 data products for a variety of geophysical parameters. Data provenance and quality information for these data sets is either very general (e.g., user guides, a list of anomalous data receipt and processing conditions over the life of the missions) or difficult to access or interpret (e.g., quality flags embedded in the data, production history files not easily available to users). Karma is a provenance collection and representation tool designed and developed for data driven workflows such as the productions streams used to produce EOS standard products. Karma records uniform and usable provenance metadata independent of the processing system while minimizing both the modification burden on the processing system and the overall performance overhead. Karma collects both the process and data provenance. The process provenance contains information about the workflow execution and the associated algorithm invocations. The data provenance captures metadata about the derivation history of the data product, including algorithms used and input data sources transformed to generate it. As part of an ongoing NASA funded project, Karma is being integrated into the AMSR-E SIPS data production streams. Metadata gathered by the tool will be presented to the data consumers as provenance graphs, which are useful in validating the workflows and determining the quality of the data product. This presentation will discuss design and implementation issues faced while incorporating a provenance tool into a structured data production flow. Prototype results will also be presented in this talk.