Accessing the Fermilab Tape Storage System

The interface to the Fermilab tape storage system is via Enstore. User documentation includes the Enstore/dCache User’s Guide.

Each allocation year, projects will be assigned tape quota as set by the USQCD Software Program Committee. The namespace of Enstore files in your project area appears as a UNIX file system mounted at directory /pnfs/lqcd/projectName where projectName is the charge name assigned to your project. Enstore and the /pnfs/lqcd area are only accessible from cluster login head nodes such as lq.fnal.gov. PNFS is also available on lqio.fnal.gov data movement node. Files within the Enstore system must first be staged to disk using the dccp command before they can be copied to the /scratch area of the workers and vice versa

NOTE: We strongly encourage users to use our lqio.fnal.gov data movement node for moving data between tape, disk and remote locations. lqio.fnal.gov has a 100GbE network interface and provides access to /lustre1 besides tape.This node have much faster network connections compared to the cluster login node thus allowing better throughput. And performing your transfers on this data movement nodes does not bog down other users on the cluster login head node.

By default, your files will be grouped together onto tapes that will hold only ProjectName “file family” files. Enstore file families are explained in the section below. When you delete a file (with rm), only the metadata is removed, but the file remains on tape. Once all of the files are deleted from a tape, you may request that the tape be recycled. Your allocation will be charged when the first file on a tape is written and you will get a refund if a tape is recycled. If you would like to have different groups of files – some which are archival and never deleted and others that can be deleted and their containing tapes recycled – email us at lqcd-admin@fnal.gov and we will set you up with multiple file families. File families are tied to subdirectories.

Tape data staging tips

Overview

Getting data off tape storage can be time consuming. The tape drives are a limited and shared resource. When you use ‘dccp’ (or ‘encp’) to request a file off of tape, the storage system has to locate the tape and move it to an available tape drive. Then the mover node has to locate that file on tape and stage it to disk ( into a dCache “read pool”) so it can be copied elsewhere.

Prestaging files

By first submitting requests to prestage the files to disk, the system can fetch each of the files into a dCache read pool. Use the command ‘dccp -P -t 3600’ to prestage the file. The before mentioned command only schedules a prestage request and returns immediately and you can then submit other prestage requests. The real advantage comes when you request several files from a single tape. The dCache system can pull all of the files requested while mounting the tape just once. Once all prestage requests for any given tape are completed, the tape can be unmounted and dCache will move on the another tape. This is much more efficient than individual file requests.

Testing if a file is in cache

You can use the ‘dccp -P -t -1‘ command to query if your file is on the “read pool” in dCache or not. This returns a zero if the file is in cache. It returns 255 if the file is not cached. After a prestage request, it can take anywhere from a few minutes to 1 or 2 hours to retrieve the file from tape.

If you have trouble accessing files

If you attempt to pre-stage a file and run into problems, please send email to lqcd-admin@fnal.gov. Be sure to include the full path name to the files you are having problems with. We will then check the files and enlist help from Fermilab’s Storage Server Admin team as needed.

Managing /pnfs/lqcd Project Area

/pnfs/lqcd/projectName looks like a standard directory, but you have to use a special command dccp to copy files in and out of this area. However, you can use file and directory manipulation commands, such as findstatmvrmmkdirrmdirchmodchgrpchown, etc. to locate tape files, print an inode’s content, rename and delete files, manipulate subdirectories, change permissions, and so forth. Using standard UNIX commands means scripts to manage tape files are almost unchanged from scripts that manage disk files.

dccp has the semantics of cp, except that wildcards are not allowed. So, you will need to script the sequence of dccp commands to copy your files into /pnfs/lqcd/projectName.

In dccp commands, the source and destination directories will determine whether files are copied to tape or read from tape.

  • Commands like “dccp file /pnfs/lqcd/projectName/subdir/file” will copy a file to tape.
  • Commands like “dccp /pnfs/lqcd/projectName/subdir/file /destdir/file” will copy a file from tape.

dccp uses a disk layer (dCache) to cache files on their way to or from tape. So when writing to tape, the file is actually stored on a dCache disk, and is subsequently written to tape as soon as practical. A dccp command writing to tape will return success as soon as the file has been transfered to disk. When reading from tape, dccp will not return until the tape has been mounted by the tape robot and the file read to disk

If you are planning to write a lot of small files to tape we highly recommend compressing the files into a single tarball, hence a single file which makes efficient use of tape. If in doubt please do not hesitate to email us with your question(s) regarding writing data to tape.

Enstore file families

The Enstore system contains more metadata for tape files beyond the standard UNIX inode information for files. The command below displays the extra information for the projectName area:

$ enstore pnfs --tags /pnfs/lqcd/projectName
.(tag)(library) = 9940
.(tag)(file_family) = lqcd
.(tag)(file_family_wrapper) = cpio_odc
.(tag)(file_family_width) = 1
.(tag)(storage_group) = lqcd
-rw-rw-r-- 11 11072 9540 4 Feb 11 14:29 /pnfs/lqcd/projectName/.(tag)(library)
-rw-rw-r-- 11 11072 9540 4 Feb 11 14:30 /pnfs/lqcd/projectName/.(tag)(file_family)
-rw-rw-r-- 11 11072 9540 8 Feb 11 14:31 /pnfs/lqcd/projectName/.(tag)(file_family_wrapper)
-rw-rw-r-- 11 11072 9540 1 Feb 11 14:31 /pnfs/lqcd/projectName/.(tag)(file_family_width)
-rw-rw-r-- 11 root root 4 Feb 11 14:31 /pnfs/lqcd/projectName/.(tag)(storage_group)

The file_family tag offers a convenient way for a group to organize their data on tape. The file_family specifies the logical name for a set of tapes. File families may be specified on a per directory basis and are, by default, inherited from the parent directory when a new directory is created. Enstore maintains files belonging to separate file families on separate sets of tape cartridges. This feature facilitates shelving or the removal of little used data sets from the robot without affecting other data sets.