Current
Cluster | Processor | Nodes | Cores | Per- Node RAM | DWF Performance (GFlops/node) | asqtad Performance (GFlops/node) | In Service |
LQ1 | Dual 20-core 2.5GHz Intel Xeon CPUs | 179 | 7,160 | 196GB | 370.0 | 280.0 | 2020-present |
LQ2 | Quad NVIDIA A100-80 GPUs and Dual 32-core 2.8GHz AMD CPUs | 18 | CPU:1,152 GPU:72 | 1 TB | 4524.5* | 1357.0* | 2023-present |
The table above shows the measured performance of DWF and asqtad inverters on the Fermilab LQCD clusters.
LQ1: 179-node cluster with dual-socket 20-core Intel 6248 “Cascade Lake” (2.5 GHz) processors and an EDR Omni-Path fabric; 196 GB RAM | ![]() |
LQ2: 18-node cluster with quad NVIDIA A100 GPUs with 80 GB of HBM2e memory; NVLink point-to-point mesh interconnecting GPUs; dual InfiniBand interfaces with 200 Gbps aggregate bandwidth; dual 3rd generation AMD EPYC 7543 32-core 2.8 GHz processors; 1 TB of system RAM | ![]() |
Retired
Cluster | Processor | Nodes | Cores | DWF Performance (GFlops/node) | asqtad Performance (GFlops/node) | In Service |
qcd | 2.8GHz Single CPU Single Core P4E | 127 | 127 | 1.400 | 1.017 | 2002-2010 |
pion | 3.2GHz Single CPU Single Core Pentium 640 | 486 | 486 | 1.729 | 1.594 | 2004-2010 |
kaon | 2.0GHz Dual CPU Dual Core Opteron | 600 | 2,400 | 4.703 | 3.832 | 2006-2013 |
jpsi | 2.1GHz Dual CPU Quad Core Opteron | 856 | 6,848 | 10.06 | 9.563 | 2008-2014 |
ds | 2GHz Quad CPU Eight Core Opteron | 420 | 13,440 | 51.52 | 50.55 | 2010-2020 |
bc | 2.8GHz Quad CPU Eight Core Opteron | 224 | 7,168 | 57.41 | 56.22 | 2013-2020 |
pi0 | 2.6GHz Dual CPU Eight Core Intel | 314 | 1,152 | 78.31 | 61.49 | 2014-2020 |
The table above shows the measured performance of DWF and asqtad inverters on all the Fermilab LQCD clusters. For qcd and pion, the asqtad numbers were taken on 64-node runs, 14^4 local lattice per node, and the DWF numbers were taken on 64-node runs using Ls=16, averaging the performance of 32x8x8x8 and 32x8x8x12 local lattice runs together. The DWF and asqtad performance figures for kaon use 128-process (32-node) runs, with 4 processes per node, one process per core. The DWF and asqtad performance figures for jpsi use 128-process (16-node) runs, with 8 processes per node, one process per core. The DWF and asqtad performance figures for ds and bc use 128-process (4-node) runs, with 32 processes per node, one process per core.