One of the emerging, and soon to be defining, characteristics of science research is the collection, usage and storage of immense amounts of data. In fields as diverse as medicine, astronomy and economics, large data sets are becoming the foundation for new scientific advances.
A new project led by University of Notre Dame researchers will explore solutions to the problems of preserving data, analysis software and computational work flows, and how these relate to results obtained from the analysis of large data sets.
Titled “Data and Software Preservation for Open Science (DASPOS),” the National Science Foundation-funded $1.8 million program is focused on high energy physics data from the Large Hadron Collider (LHC) and the Fermilab Tevatron.
The research group, which is led by Mike Hildreth, a professor of physics; Jarek Nabrzyski, director of the Center for Research Computing with a concurrent appointment as associate professor of computer science and engineering; and Douglas Thain, associate professor of computer science and engineering, will also survey and incorporate the preservation needs of other research communities, such as astrophysics and bioinformatics, where large data sets and the derived results are becoming the core of emerging science in these disciplines.
“The program will include several international workshops and the design of prototype data and software preservation architecture that meets the functionality needed by the scientific disciplines,” Hildreth said. “What is learned from building this prototype will inform the design and construction of the global data and software preservation infrastructure for the LHC, and potentially for other disciplines.”
The multidisciplinary DASPOS team includes particle physicists, computer scientists and digital librarians from Notre Dame, the University of Chicago, the University of Illinois Urbana-Champaign, the University of Nebraska at Lincoln, New York University and the University of Washington.