Fermilab provides custodial near-line long-term storage of tens of petabytes of scientific data. Services are provided for on-site near-line direct access to files on tape, or on and off site access of files through a disk cached front end to the tape storage. The tape storage system is called Enstore, which was developed by Fermilab. Enstore is integrated with the disk caching software called dCache and they both share a namespace called PNFS/Chimera.
Enstore is the mass storage system developed and implemented at Fermilab as the primary data store for scientific data sets. It provides access to data on tape to/from a user's machine on-site over the local area network, or over the wide area network through the dCache disk caching system. Enstore is designed to provide high fault tolerance and availability sufficient for the Run II data acquisition and analysis needs, the CMS Tier One facility, and an assortment of other scientific endeavors. It uses a client-server architecture, provides a simple interface for users and allows for hardware and software components to be replaced and/or expanded to meet needs.
Enstore can be used independently or in combination with caching systems such as dCache or SAM. When used with caching/buffering system, files get written to disks and then migrated to Enstore tapes. For file read requests, if the files do not reside in the disk cache, they first get retrieved from Enstore. Direct access to Enstore is limited to on-site machines - dCache is required for off-site access.
Enstore systems at Fermilab include:
- CDFEN for CDF RunII
- D0EN for D0 RunII
- STKEN for all other Fermilab users, including US-CMS T1
dCache is disk caching software developed jointly by DESY, Fermilab and NGDF. dCache can be used independently for high performance volatile storage, or in conjunction with Enstore as the high speed front end of a tape backed hierarchical storage system. In the latter use, dCache decouples the low latency and high speed of network transfer from the high latency sequential access of tapes and provides high performance access to frequently accessed files. Whether the file already exists in the disk cache, or needs to be first retrieved from tape is transparent for the user.
Fermilab dCache systems use raided disk in redundant configurations to reliably store users' files.
Files in dCache can be accessed with several different protocols. Local users can access data through dcap (a posix like interface), kerberized FTP, and NFSV4.1. Files can also be accessed via protocols designed specifically for the WAN:SRM and GridFTP. In fact, users needing to access files on tape from off-site computers must do so through dCache.
dCache systems at Fermilab include:
PNFS/Chimera is a global namespace developed jointly by DESY, Fermilab and NGDF. PNFS is used by both Enstore and dCache to distribute file names and other storage related metadata. In addition, Chimera supports direct access of file in dCache through NFSV4.1.
PNFS/Chimera has a Unix file system like directory structure, and can be mounted on on-site computers just like NFS, to access this metadata. Normal Unix permissions and administered export points are used for preventing unauthorized access to the name space. PNFS metadata is also accessed indirectly through the various dCache and Enstore transfer protocols (by specifying a pnfs file name for example).
Files are accessed directly from tape or through dCache by their pnfs/Chimera name. When a user copies a data file from their local disk to the Enstore or dCache system, he or she specifies its destination in terms of a PNFS file name. The data file gets copied to a storage volume and a corresponding metadata entry is created in the PNFS directory.