*The LQ2 worker node performance figures are the four-GPU Dslash kernel performance.
The table above shows the measured performance of DWF and asqtad inverters on all the Fermilab LQCD clusters. For qcd and pion, the asqtad numbers were taken on 64-node runs, 14^4 local lattice per node, and the DWF numbers were taken on 64-node runs using Ls=16, averaging the performance of 32x8x8x8 and 32x8x8x12 local lattice runs together. The DWF and asqtad performance figures for kaon use 128-process (32-node) runs, with 4 processes per node, one process per core. The DWF and asqtad performance figures for jpsi use 128-process (16-node) runs, with 8 processes per node, one process per core. The DWF and asqtad performance figures for ds and bc use 128-process (4-node) runs, with 32 processes per node, one process per core.
qcd: 120-node cluster (decommissioned April 2010) with single-socket 2.8 GHz Pentium 4 processors and a Myrinet fabric.
pion: 486-node cluster (decommissioned April 2010) with single-socket 3.2 GHz Pentium 640 processors and SDR Infiniband fabric.
kaon: 600-node cluster (decommissioned August 2013) with dual-socket dual-core Opteron 270 (2.0 GHz) processors and a DDR Mellanox Infiniband fabric.
jPsi: 856-node cluster (decommissioned May 19, 2014) with dual-socket quad-core Opteron 2352 (2.1 GHz) processors and a DDR Mellanox Infiniband fabric.
ds: 420-node cluster (224 nodes decommissioned August 2016, 196 nodes decommissioned April 2020) with quad-socket eight-core Opteron 6128 (2.0 GHz) processors and a QDR Mellanox Infiniband fabric.
dsg: 76-node cluster (decommissioned April 2020) with dual-socket four-core Intel Xeon E5630 processors, two NVIDIA Tesla M2050 GPUs per node and a QDR Mellanox Infiniband fabric.
bc: 224-node cluster (decommissioned April 2020) with quad-socket eight-core Opteron 6320 (2.8 GHz) processors and a QDR Mellanox Infiniband fabric.
π: 314-node cluster (decommissioned and repurposed April 2020) with dual-socket eight-core Intel E5-2650v2 “Ivy Bridge” (2.6 GHz) processors and a QDR Mellanox Infiniband fabric.
π0g: 32-node cluster (decommissioned and repurposed April 2020) with dual-socket eight-core Intel E5-2650v2 “Ivy Bridge” (2.6 GHz) processors, four NVidia Tesla K40m GPUs per node and a QDR Mellanox Infiniband fabric.
LQ1:183-node cluster with dual-socket 20-core Intel 6248 “Cascade Lake” (2.5 GHz) processors and an EDR Omni-Path fabric.
LQ2: 18-node cluster with quad NVIDIA A100-80 GPUs, dual 3rd Gen. AMD EPYC 7543 32-Core Processors, dual NDR/200 NVIDIA/Mellanox InfiniBand adapters per node.