Fermilab provides custodial near-line long-term storage of tens of petabytes of scientific data. Services are provided for on-site near-line direct access to files on tape, or on and off site access of files through a disk cached front end to the tape storage. The tape storage system is called Enstore. Enstore is integrated with the disk caching software called dCache and they both share a namespace called PNFS.
Enstore is the mass storage system developed and implemented at Fermilab as the primary data store for scientific data sets. It provides access to data on tape to/from a user's machine on-site over the local area network, or over the wide area network through the dCache disk caching system. Enstore is designed to provide high fault tolerance and availability sufficient for the RunII data acquisition and analysis needs, the CMS Tier One facility, and an assortment of other scientific endeavors. It uses a client-server architecture, provides a simple interface for users and allows for hardware and software components to be replaced and/or expanded to meet needs.
Enstore can be used independently or in combination with caching systems such as dCache or SAM. When used with caching/buffering system, files get written to disks and then migrated to Enstore tapes. For file read requests, if the files do not reside in the disk cache, they first get retrieved from Enstore. Direct access to Enstore is limited to on-site machines. dCache is required for off-site access.
Enstore systems at Fermilab include:
- CDFEN for CDF RunII
- D0EN for D0 RunII
- STKEN for all other Fermilab users
dCache is disk caching software developed jointly by DESY, Fermilab and NGDF. dCache can be used independently for high performance volatile storage, or in conjunction with Enstore as the high speed front end of a tape backed hierarchical storage system. In the latter use, dCache decouples the low latency and high speed of network transfer from the high latency sequential access of tapes and provides high performance access to frequently accessed files. Whether the file already exists in the disk cache, or needs to be first retrieved from tape is transparent for the user.
Fermilab dCache systems use raided disk in redundant configurations to reliably store users' files.
Files in dCache can be accessed with several different protocols. Local users can access data through dcap (a posix like interface) and kerberized FTP. Files can also be accessed via protocols designed specifically for the WAN:SRM and GridFTP. In fact, users needing to access files on tape from off-site computers must do so through dCache.
dCache systems at Fermilab include:
PNFS is a global namespace developed jointly by DESY, Fermilab and NGDF. PNFS is used by both Enstore and dCache to distribute filenames and other storage related metadata.
PNFS has a unix file system like directory structure, and can be mounted on on-site computers just like NFS, to access this metadata. Normal UNIX permissions and administered export points are used for preventing unauthorized access to the name space. PNFS metadata is also accessed indirectly through the various dCache and Enstore transfer protocols (by specifying a pnfs filename for example).
Files are accessed directly from tape or through dCache by their pnfs name. When a user copies a data file from their local disk to the Enstore or dCache system, he or she specifies its destination in terms of a PNFS filename. The data file gets copied to a storage volume and a corresponding metadata entry is created in the PNFS directory.