Hardware

Current

ClusterProcessorNodesCoresPer- Node RAMDWF
Performance (GFlops/node)
asqtad
Performance (GFlops/node)
In Service
LQ1Dual 20-core 2.5GHz Intel Xeon CPUs1797,160196GB370.0280.02020-present
LQ2Quad NVIDIA A100-80 GPUs
and Dual 32-core 2.8GHz AMD CPUs
18CPU:1,152
GPU:72
1 TB4524.5*1357.0*2023-present
*The LQ2 worker node performance figures are the four-GPU Dslash kernel performance.

The table above shows the measured performance of DWF and asqtad inverters on the Fermilab LQCD clusters.

LQ1: 179-node cluster with dual-socket 20-core Intel 6248 “Cascade Lake” (2.5 GHz) processors and an EDR Omni-Path fabric; 196 GB RAM
LQ2: 18-node cluster with quad NVIDIA A100 GPUs with 80 GB of HBM2e memory; NVLink point-to-point mesh interconnecting GPUs; dual InfiniBand interfaces with 200 Gbps aggregate bandwidth; dual 3rd generation AMD EPYC 7543 32-core 2.8 GHz processors; 1 TB of system RAM

Retired

ClusterProcessorNodesCoresDWF
Performance (GFlops/node)
asqtad
Performance (GFlops/node)
In Service
qcd2.8GHz Single CPU Single Core P4E1271271.4001.0172002-2010
pion3.2GHz Single CPU Single Core Pentium 6404864861.7291.5942004-2010
kaon2.0GHz Dual CPU Dual Core Opteron6002,4004.7033.8322006-2013
jpsi2.1GHz Dual CPU Quad Core Opteron8566,84810.069.5632008-2014
ds2GHz Quad CPU Eight Core Opteron42013,44051.5250.552010-2020
bc2.8GHz Quad CPU Eight Core Opteron2247,16857.4156.222013-2020
pi02.6GHz Dual CPU Eight Core Intel3141,15278.3161.492014-2020

The table above shows the measured performance of DWF and asqtad inverters on all the Fermilab LQCD clusters. For qcd and pion, the asqtad numbers were taken on 64-node runs, 14^4 local lattice per node, and the DWF numbers were taken on 64-node runs using Ls=16, averaging the performance of 32x8x8x8 and 32x8x8x12 local lattice runs together. The DWF and asqtad performance figures for kaon use 128-process (32-node) runs, with 4 processes per node, one process per core. The DWF and asqtad performance figures for jpsi use 128-process (16-node) runs, with 8 processes per node, one process per core. The DWF and asqtad performance figures for ds and bc use 128-process (4-node) runs, with 32 processes per node, one process per core.

qcd: 120-node cluster (decommissioned April 2010) with single-socket 2.8 GHz Pentium 4 processors and a Myrinet fabric.
pion: 486-node cluster (decommissioned April 2010) with single-socket 3.2 GHz Pentium 640 processors and SDR Infiniband fabric.
kaon: 600-node cluster (decommissioned August 2013) with dual-socket dual-core Opteron 270 (2.0 GHz) processors and a DDR Mellanox Infiniband fabric.
jPsi: 856-node cluster (decommissioned May 19, 2014) with dual-socket quad-core Opteron 2352 (2.1 GHz) processors and a DDR Mellanox Infiniband fabric.
ds: 420-node cluster (224 nodes decommissioned August 2016, 196 nodes decommissioned April 2020) with quad-socket eight-core Opteron 6128 (2.0 GHz) processors and a QDR Mellanox Infiniband fabric.
dsg: 76-node cluster (decommissioned April 2020) with dual-socket four-core Intel Xeon E5630 processors, two NVIDIA Tesla M2050 GPUs per node and a QDR Mellanox Infiniband fabric.
bc: 224-node cluster (decommissioned April 2020) with quad-socket eight-core Opteron 6320 (2.8 GHz) processors and a QDR Mellanox Infiniband fabric.
π: 314-node cluster (decommissioned and repurposed April 2020) with dual-socket eight-core Intel E5-2650v2 “Ivy Bridge” (2.6 GHz) processors and a QDR Mellanox Infiniband fabric.
π0g: 32-node cluster (decommissioned and repurposed April 2020) with dual-socket eight-core Intel E5-2650v2 “Ivy Bridge” (2.6 GHz) processors, four NVidia Tesla K40m GPUs per node and a QDR Mellanox Infiniband fabric.