Fermilab Lattice QCD Facility

Hardware

Current

Cluster	Processor	Nodes	Cores	Per- Node RAM	DWF Performance (GFlops/node)	asqtad Performance (GFlops/node)	In Service
LQ1	Dual 20-core 2.5GHz Intel Xeon CPUs	179	7,160	196GB	370.0	280.0	2020-present
LQ2	Quad NVIDIA A100-80 GPUs and Dual 32-core 2.8GHz AMD CPUs	18	CPU:1,152 GPU:72	1 TB	4524.5^*	1357.0^*	2023-present

^*The LQ2 worker node performance figures are the four-GPU Dslash kernel performance.

The table above shows the measured performance of DWF and asqtad inverters on the Fermilab LQCD clusters.

Retired

Cluster	Processor	Nodes	Cores	DWF Performance (GFlops/node)	asqtad Performance (GFlops/node)	In Service
qcd	2.8GHz Single CPU Single Core P4E	127	127	1.400	1.017	2002-2010
pion	3.2GHz Single CPU Single Core Pentium 640	486	486	1.729	1.594	2004-2010
kaon	2.0GHz Dual CPU Dual Core Opteron	600	2,400	4.703	3.832	2006-2013
jpsi	2.1GHz Dual CPU Quad Core Opteron	856	6,848	10.06	9.563	2008-2014
ds	2GHz Quad CPU Eight Core Opteron	420	13,440	51.52	50.55	2010-2020
bc	2.8GHz Quad CPU Eight Core Opteron	224	7,168	57.41	56.22	2013-2020
pi0	2.6GHz Dual CPU Eight Core Intel	314	1,152	78.31	61.49	2014-2020

The table above shows the measured performance of DWF and asqtad inverters on all the Fermilab LQCD clusters. For qcd and pion, the asqtad numbers were taken on 64-node runs, 14^4 local lattice per node, and the DWF numbers were taken on 64-node runs using Ls=16, averaging the performance of 32x8x8x8 and 32x8x8x12 local lattice runs together. The DWF and asqtad performance figures for kaon use 128-process (32-node) runs, with 4 processes per node, one process per core. The DWF and asqtad performance figures for jpsi use 128-process (16-node) runs, with 8 processes per node, one process per core. The DWF and asqtad performance figures for ds and bc use 128-process (4-node) runs, with 32 processes per node, one process per core.