Data Management Practices and Policies for Fermilab Experiments

Overview: Fermilab is the lead laboratory for many particle physics experiments. Some of those experiments leverage the lab’s accelerator facility while many do not. No matter whether part of the Fermilab facility or not, these experiments all go through multiple phases in their lifecycle, from conceptual design to prototype detectors in test beams, to full blown operating experiments and ultimately final data analysis, data archival and knowledge preservation. While the needs for a data management plan vary in detail over the experiment lifecycle, they all have common themes with respect to digital data and how it is treated.

A key deliverable of each experiment is a digital record of the data representing selected physics of interest whether that be a cosmic ray passing through a detector or a digital snapshot of the sky. Fermilab provides the experiments in which it is the lead laboratory the means to store, manage, access and share the raw data, simulation data as well as all of the research dependent reconstruction and calibration data.

Policy: It is the policy of the facility to provide long-term data storage and data access to all of the experiments and scientific programs associated with the facility in order to ensure the integrity, availability and safe keeping of the data products, associated conditions data, as well as relevant simulation data. How long the data is stored and subsequently archived is typically negotiated on a case-by-case basis, depending on the needs and uniqueness of the experimental data being captured. However, the default is that Fermilab will keep the data active for a minimum of 5 years after data taking ceases. All experiments in which Fermilab is the lead laboratory will use Fermilab as its primary repository for the data. Other copies of the data may exist throughout the world depending on the experiments’ unique needs.

Resources Available: See Computing and Data Resources at Fermilab for an overall description of available resources. Each experiment will be provided a minimal baseline level of support for their data needs to get each experiment started. A yearly Portfolio Review is held and resource allocations for each experiment are made taking into account the facilities total demand for computing needs, budgets, and scientific priorities. Exceptional needs may require that additional funding be secured in order to satisfy them.

Data Validation: It is expected that each experiment will take ownership of the operation of its experiment though will receive ample support from the laboratory. It is the experiments’ responsibility for the integrity of all its public scientific research signed by the collaboration. A vigorous internal review process prior to its public presentation handles that validation.

Data availability and sharing: Fermilab provides the means for experiments’ researchers to access data from anywhere in the world and to share data. There are a variety of systems available to meet wide ranging needs; the appropriate technology choice will be made consistent with a particular experiment’s requirements.

Researcher Responsibilities: The decision as to who may have access to the data is the responsibility of the experiment to define. The default will be to only allow members of the collaboration to access it. Fermilab provides tools to implement the access decisions once it is informed.

The Experiment must, in conjunction with the laboratory, define the long-term retention period for its data products consistent with the policy described in Computing and Data Resources at Fermilab. The overall retention length will be determined on a case-by-case basis depending on the scientific relevance of the data and the cost required to maintain it. Individual agreements will be established between each experiment and the facility to document these requirements.

It is expected that any experiment that wants something other than the default parameters must contact the head of Computational Science and AI Directorate and CIO to establish a special Service Level Agreement (SLA) that all parties agree to.

This plan complies with: https://www.energy.gov/datamanagement/doe-policy-digital-research-data-management