Key Provenance of Earth Science Observational Data Products

TitleKey Provenance of Earth Science Observational Data Products
Publication TypeConference Proceedings
Year of Publication2011
AuthorsConover, H, Plale, B, Aktas, M, Ramachandran, R, Purohit, P, Jensen, S, Graves, S
Conference NameAmerican Geophysical Union
Date Published12/2011
Conference LocationSan Francisco, CA
Keywordsuser interfaces
AbstractAs the sheer volume of data increases, particularly evidenced in the earth and environmental sciences, local arrangements for sharing data need to be replaced with reliable records about the what, who, how, and where of a data set or collection. This is frequently called the provenance of a data set. While observational data processing systems in the earth sciences have a long history of capturing metadata about the processing pipeline, current processes are limited in both what is captured and how it is disseminated to the science community. Provenance capture plays a role in scientific data preservation and stewardship precisely because it can automatically capture and represent a coherent picture of the what, how and who of a particular scientific collection. It reflects the transformations that a data collection underwent prior to its current form and the sequence of tasks that were executed and data products applied to generate a new product. In the NASA-funded Instant Karma project, we examine provenance capture in earth science applications, specifically the Advanced Microwave Scanning Radiometer - Earth Observing System (AMSR-E) Science Investigator-led Processing system (SIPS). The project is integrating the Karma provenance collection and representation tool into the AMSR-E SIPS production environment, with an initial focus on Sea Ice. This presentation will describe capture and representation of provenance that is guided by the Open Provenance Model (OPM). Several things have become clear during the course of the project to date. One is that core OPM entities and relationships are not adequate for expressing the kinds of provenance that is of interest in the science domain. OPM supports name-value pair annotations that can be used to augment what is known about the provenance entities and relationships, but in Karma, annotations cannot be added during capture, but only after the fact. This limits the capture system’s ability to record something it learned about an entity after the event of its creation in the provenance record. We will discuss extensions to the Open Provenance Model (OPM) and modifications to the Karma tool suite to address this issue, more efficient representations of earth science kinds of provenance, and definition of metadata structures for capturing related knowledge about the data products and science algorithms used to generate them. Use scenarios for provenance information is an active topic of investigation. It has additionally become clear through the project that not all provenance is created equal. In processing pipelines, some provenance is repetitive and uninteresting. Because of the volume of provenance, this obscures what are the interesting pieces of provenance. Methodologies to reveal science-relevant provenance will be presented, along with a preview of the AMSR-E Provenance Browser.