Fermilab Data Management Guidelines & Policies

The Scientific Program Committee (SPC) allocates resources during each program year. In addition to compute hours, each project may be allocated some amount of disk or tape storage. Each project should have a plan for managing their data over its lifetime. These plans need to include at a minimum the following:

  1. how much space you will need, 
  2. where you will store copies of precious data, 
  3. what you will do with the data at the end of the program year, and
  4. an individual to be the responsible data manager who will oversee this plan.

See the End of Year Housekeeping section below for more information.

Responsibility for the data

The project and its users are the owners of the data and are responsible for managing and preserving data. The Filesystems web page lists storage areas that are backed up. Regardless of these backups, users should save copies of precious files, preferably off-site.

For example: our Lustre filesystem is not backed up and is meant to be used as volatile temporary scratch space. Users are responsible for keeping copies of precious data stored on Lustre at an alternate location that is backed up.

To that end, if your project is going to need storage space at some another site or institution, please make arrangements for that space before you need it. Storage requests can take time to get approvals and should be submitted before you need the space.

Quota Management

Your allocated storage in the Lustre filesystem will have a set quota based on the disk storage allocation from the Scientific Program Committee. We will set up a Unix group (GID) named similar to your project name. The quota will be set within Lustre at the group level.

Since we manage quotas based on group ownership, we require that all of the files and directories under the project’s top-level directory (i.e. /lustre1/projectname) be owned by the allocated group. We set permissions at the top level “projectname” directory such that any subdirectories or files created underneath will inherit group ownership from the parent “projectname” directory. LQCD-Admin staff will review group ownerships at regular intervals as needed. Any files or directories that are associated with an incorrect group will be updated to have ownership by the proper allocation GID. This mechanism allows the proper enforcement of quotas per project.

The home area quotas for all users are based on a user (UID) level. These are entirely separate from the project’s Lustre quota.

Tape vs Disk storage

Magnetic tape storage provides bulk data storage that is currently cheaper per GB than either magnetic or solid-state disk storage. Magnetic tape costs for data in total are a significant recurring cost for the project. Projects may request short-term tape storage for the active life of the project and its extensions. Projects are expected to remove short-term data once a project becomes inactive.

Projects may request long-term magnetic tape storage for critical results or community data such as QCD gauge configurations. Typically, storage space is granted for the lifetime of the tape media, and a single migration to new tape technologies or media in the future. Additional migrations may incur significant costs to the researchers and the USQCD collaboration. Projects must request long-term storage as part of their proposal and data management plan. The amount of storage and hosting institutions will be negotiated by the SPC, EC, and the USQCD facilities.

End of the program year data housekeeping

At the end of the allocation year, each project must arrange to clear their old data from Lustre and Project storage so that space can be allocated to new projects in the new program year. The project’s PI or a designated data manager will be responsible for this data management task.

Several scenarios are possible:

Volatile or Working data should be deleted ASAP

Any data that has already been copied elsewhere or stored in an archival storage system should be deleted as soon as possible. This frees up space right away for new allocations.

We ask that the migration and clearing of any data that remains at this point be completed within two months. If additional time will be needed, we ask that the PI or data manager send details of their plan and schedule for clearing the remaining space to lqcd-admin@fnal.gov. We can then discuss the plan and set a reasonable deadline for completing this data housekeeping.

Transfer to a new allocation

If your project is continuing under a new allocation, any data that will still be needed and useful can be folded into that new allocation.

Data already copied elsewhere

There should already be copies of precious data elsewhere. Once the data manager verifies that the copies exist, this data can be deleted from the Lustre storage area.

The remaining data should be copied elsewhere

Any precious data that remains should be copied to another site. Using Globus Online is the most efficient and quickest way to transfer files out of Lustre. More information about our dedicated LQ1 Globus endpoint is available in the filesystems web page.

The final deadline

Any data that remains three months after the end of the program year is subject to deletion by LQCD-Admin staff unless an extension has been negotiated.