Chapter 1: Introduction to the Mass Storage System

The mass storage system at Fermilab has three major components:

• Enstore, the principal mass storage component; Enstore provides access to data on tape or other storage media both local to a user’s machine and over networks.

• A namespace (implemented by PNFS), which presents the storage media library contents as though the data files existed in a hierarchical UNIX file system.

• dCache, a data file caching system; dCache is implemented as a front-end to Enstore.

1.1 About Enstore

[spacer]

Enstore is the mass storage system implemented at Fermilab as the primary data store for large data sets. Its design was inspired by the mass storage architecture at DESY, and it originates from discussions with the DESY designers. Enstore is designed to provide high fault tolerance and availability sufficient for the RunII data acquisition needs, as well as easy administration and monitoring. It uses a client-server architecture which provides a generic interface for users and allows for hardware and software components that can be replaced and/or expanded.

Enstore has two major kinds of software components:

• the Enstore servers, which are software modules that have specific functions, e.g., maintain database of data files, maintain database of storage volumes, maintain configuration, look for error conditions and sound alarms, communicate user requests down the chain to the tape robots, and so on. See Chapter 8: Overview of the Enstore Servers.

encp, a program for copying files directly to and from the mass storage system. See Chapter 6: Copying Files with Encp

Enstore can be used directly only from on-site machines. Off-site users are restricted to accessing Enstore via dCache, and in fact on-site users are encouraged to go through dCache as well.

Enstore supports both automated and manual storage media libraries. It allows for a larger number of storage volumes than slots. It also allows for simultaneous access to multiple volumes through automated media libraries. There is no preset upper limit to the size of a data file in the enstore system; the actual size is limited by the physical resources. The lower limit on the file size is zero. The upper limit on the number of files that can be stored on a single volume is about 5000.

Enstore allows users to search and list contents of media volumes as easily as they search native file systems. The stored files appear to the user as though they exist in a mounted UNIX directory. The mounted directory is actually a distributed virtual file system in PNFS namespace containing metadata for each stored file. Enstore eliminates the need to know volume names or other details about the actual file storage.

Users typically access Enstore via the dCache caching system. The protocols supported by dCache include dccp, gridftp (globus-url-copy), kerberized ftp and weakly-authenticated ftp (these are described in Chapter 5: Using the dCache to Copy Files to/from Enstore). On-site users may bypass dCache and use the encp program, the Enstore copy command roughly modeled on UNIX’s cp, to copy files directly to and from storage media.

There are several installed Enstore systems at Fermilab. Currently these include CDFEN for CDF RunII, D0EN for D0 RunII, and STKEN for all other Fermilab users. Web-based monitoring for the Enstore systems is available at http://hppc.fnal.gov/enstore/. Currently, all storage libraries are tape libraries. The Computing Division operates and maintains the tape robots, slots, and other tape equipment, but for the present, experiments provide and manage their own volumes.

1.2 PNFS Namespace

[spacer]

PNFS is a virtual file system package that implements the Enstore namespace. It was written at DESY. PNFS is mounted like NFS, but it is a virtual file system only. It maintains file grouping and structure information via a set of tags in each directory. The encp program communicates this information between PNFS and the Enstore servers when it uploads or downloads a data file.

PNFS can only be mounted on machines that are physically at the lab1. When a user copies a data file from disk to the Enstore system, he or she specifies its destination in terms of a PNFS directory. The data file gets copied to a storage volume (selected according to the tags of the specified PNFS directory) and a corresponding metadata entry is created in the PNFS directory. This entry takes the name given in the encp command line or in the protocol-specific dCache command. It contains metadata about the data file, including information about the file transfer, the data storage volume on which the data file resides, the file’s location on the volume, and so on.

To browse file entries in the Enstore system, on-site users can mount their experiment’s PNFS storage area on their own computers, and interact with it using standard non-I/O UNIX operating system utilities (see section 4.1 UNIX Commands You can Use in PNFS Space). Normal UNIX permissions and administered export points are used for preventing unauthorized access to the name space.

1.3 dCache

[spacer]

1.3.1 Overview

The dCache was originally designed as a front-end for a set of Hierarchical Storage Managers (HSMs), namely Enstore, EuroGate and DESY’s OSM. (It has since been further developed and can be implemented stand-alone. We do not address the stand-alone functionality in this manual.) When used as a front-end to an HSM, dCache can be viewed as an intermediate “relay station” between client applications and the HSM (Enstore, in our case). Client systems communicate with dCache via any of a number of protocols, listed in 1.3.3 Protocols for Communicating with dCache. DCache communicates with Enstore (in a manner transparent to the user) via a high-speed ethernet connection. The dCache decouples the potentially slow network transfer (to and from client machines) from the fast storage media I/O in order to keep Enstore from bogging down.

Data files uploaded to the dCache from a user’s machine are stored on highly reliable RAID disks pending transfer to Enstore. Files already written to storage media that get downloaded to the dCache from Enstore are stored on ordinary disks.

The dCache is installed at Fermilab on a server machine on which the /pnfs root area is mounted. Since PNFS namespace can only be mounted on machines in the fnal.gov domain, off-site users may only access Enstore via the dCache. On-site users are strongly encouraged to go through the dCache as well. We discuss dCache in more depth in Chapter 5: Using the dCache to Copy Files to/from Enstore.

Read more general information about the dCache at the DESY site: http://www-dcache.desy.de.

1.3.2 Advantages

The principal advantages of using the dCache are:

• Optimized usage of existing tape drives due to transfer rate adaption.

• Possible usage of slower and cheaper drive technology without overall performance reduction.

• Optimized usage of the robot systems by coordinated read and write requests.

• Better usage of network bandwidth by exploring the best location for the data.

• No explicit staging required to access the data.

• Ability to do posix-like IO reads and writes to data files instead of transferring entire files.

• Working ROOT interfaces.

• Tapeless data methods, raw data to reconstruction to analysis to users.

• Written to tape as 'by-product'; no tape delays.

• Pnfs does not have to be mounted for access to the data.

• Same access to storage system, on and off site. Strong authentication, both gss and gsi to the data. Native and ftp access to the data.

• The access methods for data would be uniform, independent of data’s media location.

• Even without the back-end HSM (e.g., Enstore), the dCache system could be seen as a huge data store with a unique namespace and standardized access methods. Care will be taken that valuable data resides on safe disks as long as no HSM copy exists. Back-end storage to the HSM can be done regularly (policy based) or by manual intervention only.

• A joint DESY-FNAL effort makes the use of manpower more efficient and guarantees continued support and maintenance of the developed software.

1.3.3 Protocols for Communicating with dCache

Whenever an application needs to talk to the dCache, it has to choose an appropriate door into the system. There are a number of different dCache doors through which users/applications can send requests to Enstore. Doors are protocol converters from the dCache point of view, and they are responsible for strong authentication, as necessary. One door may be for Kerberized ftp read/write access, another for dcap (dCache native C API), gridftp, weakly authenticated ftp read-only access, and so on. Each experiment determines which door(s) its experimenters may use, and communicates this information to the Enstore administrators who manage the doors’ configurations. Most doors are for native transfers, and are local. See Chapter 5: Using the dCache to Copy Files to/from Enstore for more information.

 




  1. There are some exceptions; arrangements for PNFS mounting have been made for some experiments whose systems are managed by the Computing Division, e.g., soudan.org for Minos.