{"id":456,"date":"2020-10-02T12:46:21","date_gmt":"2020-10-02T17:46:21","guid":{"rendered":"https:\/\/computing.fnal.gov\/wilsoncluster\/?page_id=456"},"modified":"2024-02-07T15:45:17","modified_gmt":"2024-02-07T21:45:17","slug":"slurm-job-scheduler","status":"publish","type":"page","link":"https:\/\/computing.fnal.gov\/wilsoncluster\/slurm-job-scheduler\/","title":{"rendered":"SLURM job scheduler"},"content":{"rendered":"\n<p>Slurm workload manager, formerly known as Simple Linux Utility For Resource Management (SLURM), is an open source, fault-tolerant, and highly scalable resource manager and job scheduling system of high availability currently developed by&nbsp;<a href=\"https:\/\/slurm.schedmd.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">SchedMD<\/a>. Initially developed for large Linux Clusters at the Lawrence Livermore National Laboratory, Slurm is used extensively on most Top 500 supercomputers around the globe.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/computing.fnal.gov\/wilsoncluster\/wp-content\/uploads\/2024\/01\/image.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"545\" src=\"https:\/\/computing.fnal.gov\/wilsoncluster\/wp-content\/uploads\/2024\/01\/image.png\" alt=\"\" class=\"wp-image-7118\" srcset=\"https:\/\/computing.fnal.gov\/wilsoncluster\/wp-content\/uploads\/2024\/01\/image.png 1024w, https:\/\/computing.fnal.gov\/wilsoncluster\/wp-content\/uploads\/2024\/01\/image-300x160.png 300w, https:\/\/computing.fnal.gov\/wilsoncluster\/wp-content\/uploads\/2024\/01\/image-768x409.png 768w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\"><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-red-color\">Caveat concerning batch job submission<\/mark><\/h3>\n\n\n\n<p><strong><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-red-color\">Please note that Slurm batch jobs will NOT run from your home directory on the Wilson cluster!<\/mark><\/strong><\/p>\n\n\n\n<p>Your Wilson home directory, <code>$HOME<\/code>, is your lab-wide &#8220;nashome&#8221; directory. The&nbsp;<code>\/nashome<\/code>&nbsp;filesystem is mounted with Kerberos authentication, and unfortunately, Kerberos is <strong><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-red-color\">not compatible<\/mark><\/strong> with Slurm. Start batch jobs from your area under either&nbsp;<code>\/work1<\/code>&nbsp;or&nbsp;<code>\/wclustre<\/code>.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Common Slurm commands<\/h4>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><strong>command<\/strong><\/td><td><strong>brief<\/strong> <strong>description<\/strong><\/td><\/tr><tr><td><a href=\"https:\/\/slurm.schedmd.com\/squeue.html\">squeue<\/a><\/td><td>reports the state of queued jobs<\/td><\/tr><tr><td><a href=\"https:\/\/slurm.schedmd.com\/sbatch.html\">sbatch<\/a><\/td><td>submit a job script for later execution<\/td><\/tr><tr><td><a href=\"https:\/\/slurm.schedmd.com\/scancel.html\">scancel<\/a><\/td><td>cancel a pending or running job<\/td><\/tr><tr><td><a href=\"https:\/\/slurm.schedmd.com\/scontrol.html\">scontrol<\/a><\/td><td>view or modify Slurm configuration and state<\/td><\/tr><tr><td><a href=\"https:\/\/slurm.schedmd.com\/salloc.html\">salloc<\/a><\/td><td>allocate resources and spawn a shell which is then used to execute srun commands<\/td><\/tr><tr><td><a href=\"https:\/\/slurm.schedmd.com\/srun.html\">srun<\/a><\/td><td>submit a job for execution or initiate job steps in real time<\/td><\/tr><tr><td><a href=\"https:\/\/slurm.schedmd.com\/sacctmgr.html\">sacctmgr<\/a><\/td><td>view and modify Slurm account information<\/td><\/tr><tr><td><a href=\"https:\/\/slurm.schedmd.com\/sinfo.html\">sinfo<\/a><\/td><td>worker node and partition information<\/td><\/tr><\/tbody><\/table><figcaption class=\"wp-element-caption\">Table 1 &#8211; Common Slurm Commands<\/figcaption><\/figure>\n\n\n\n<p>See the&nbsp;<a href=\"https:\/\/slurm.schedmd.com\/pdfs\/summary.pdf\">downloadable<\/a>&nbsp;PDF cheatsheet for a summary of the commands. SchedMD has a&nbsp;<a href=\"https:\/\/slurm.schedmd.com\/quickstart.html\">quick start guide<\/a>&nbsp;for Slurm users.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">squeue examples: checking the status of batch jobs<\/h4>\n\n\n\n<p>The command below checks the status of your jobs in the queue.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ squeue -u $USER<\/code><\/pre>\n\n\n\n<p>The following command will show jobs assigned to the wc_cpu (CPU) partition.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ squeue -p wc_cpu<\/code><\/pre>\n\n\n\n<p>The&nbsp;<code>squeue<\/code>&nbsp;command with the options below will tell you the status of GPU batch jobs including which workers are in use, how many GPUs a job is using, the number of cores, maximum memory, how long the job has been running, and the time limit for the job.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ squeue -p wc_gpu --Format=Account:.10,UserName:.10,NodeList:.10,tres-alloc:.64,State:.8,TimeUsed:.10,TimeLimit:.12<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\">Details about your Slurm account(s)<\/h4>\n\n\n\n<p>Every Slurm user has a default account. You can find your default with the command:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ sacctmgr list user name=$USER\n      User   Def Acct     Admin\n---------- ---------- ---------\n     smith    wc_test      None<\/code><\/pre>\n\n\n\n<p>Users on Wilson may have multiple Slurm accounts with different quality of service (QOS) levels. You can find all your associated accounts along with available QOS levels using the following command:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ sacctmgr list user WithAssoc Format=User,Account,QOS,DefaultQOS name=$USER\n      User    Account                  QOS   Def QOS\n---------- ---------- -------------------- ---------\n     smith    hpcsoft             opp,test       opp\n     smith   scd_devs     opp,regular,test       opp\n     smith  spack4hpc     opp,regular,test       opp\n     smith    wc_test     opp,regular,test       opp<\/code><\/pre>\n\n\n\n<p><strong>NOTE:<\/strong>&nbsp;If you do not specify an account name during your job submission (using&nbsp;<code>--account<\/code>), your default account will be used.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Quality of service (QoS) levels<\/h4>\n\n\n\n<p>Jobs submitted to Slurm are associated with an appropriate QoS (or Quality of Service) configuration. Admins assign parameters to a QoS that are used to manage dispatch priority and resource use limits.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><strong>Name<\/strong><\/td><td><strong>Description<\/strong><\/td><td><strong>Priority<\/strong> <strong>(higher is better)<\/strong><\/td><td><strong>Global Resource Constraint<\/strong>s<\/td><td><strong>Max Wall<\/strong><strong>time<\/strong><\/td><td><strong>Per Account Constraints<\/strong><\/td><td><strong>Per User Constraints<\/strong><\/td><\/tr><tr><td>admin<\/td><td>admin testing<\/td><td>100<\/td><td>None<\/td><td>None<\/td><td>None<\/td><td>None<\/td><\/tr><tr><td>test<\/td><td>for quick user testing<\/td><td>75<\/td><td>Max nodes = 5<br>Max GPUs = 4<\/td><td>04:00:00<\/td><td>None<\/td><td>Max running jobs = 1<br>Max queued jobs = 3<\/td><\/tr><tr><td>regular<\/td><td>regular QoS for approved accounts<\/td><td>25<\/td><td>None<\/td><td>1-00:00:00<\/td><td>None<\/td><td>None<\/td><\/tr><tr><td>walltime7d<\/td><td>only available to certain approved accounts<\/td><td>25<\/td><td>Max nodes = 50<\/td><td>7-00:00:00<\/td><td>None<\/td><td>None<\/td><\/tr><tr><td>opp<\/td><td>available to all accounts for opportunistic usage<\/td><td>0<\/td><td>None<\/td><td>08:00:00<\/td><td>None<\/td><td>Max queued jobs = 50<\/td><\/tr><\/tbody><\/table><figcaption class=\"wp-element-caption\">Table 2 &#8211; Wilson cluster QoS overview<\/figcaption><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>A few notes about the resource constraints:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Global resource constraints are enforced across all the accounts in the cluster. For example, <code>test<\/code> QoS restricts access to 4 GPUs globally. If account <code>A<\/code> is using 4 GPUs, account <code>B<\/code> has to wait until the resources are free.<\/li>\n\n\n\n<li>Per account constraints are enforced on an account basis. We currently do not have per account constraints on Wilson cluster.<\/li>\n\n\n\n<li>Per user constraints are enforced on a user basis. For example, <code>test<\/code> QoS restricts the number of running jobs per user to 1. This means a single user, regardless of their account, cannot run more than a single job using the QoS.<\/li>\n\n\n\n<li>Finally, these constraints may be relaxed or adjusted from time to time based on the job mix and to maximize cluster utilization.<\/li>\n<\/ul>\n\n\n\n<p>A few notes about available QoS:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Users can select QoS appropriately by using <code>--qos<\/code> directive with their submit commands. The default QoS for all accounts is <code>opp<\/code>. Jobs running under this QoS have the lowest priority and will only start when there aren&#8217;t any eligible <code>regular<\/code> QoS jobs waiting in the queue.<\/li>\n\n\n\n<li>The&nbsp;<code>test<\/code>&nbsp;QoS is for users to run test jobs for fast turnaround debugging. These test jobs run at a relatively high priority so that they will start as soon as nodes are available. Any user can have no more than three jobs submitted and no more than one job running at any given time.<\/li>\n\n\n\n<li>All Wilson users have opportunistic access to batch resources. Compute projects with specific scientific or engineering goals are able to request access to higher QoS level and more compute resources.<\/li>\n<\/ul>\n\n\n\n<p><strong>NOTE:<\/strong> if you do not specify a QOS during job submission (using <code>--qos<\/code>), the default <code>opp<\/code> will be used.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Slurm partitions<\/h4>\n\n\n\n<p>A partition in Slurm is a way to categorize worker nodes by their unique features. On the Wilson cluster we distinguish workers meant for CPU computing from GPU-acclerated workers. There is a separate partition for the one IBM Power9 &#8220;Summit-like&#8221; worker since the Power9 architecture is not binary compatible with the common AMD\/Intel x86_64 architecture. There is also a test partition to set aside CPU workers for rapid testing. Slurm allows setting job limits by partition.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><strong>Name<\/strong><\/td><td><strong>Description<\/strong><\/td><td><strong>Total Nodes<\/strong><\/td><td><strong>Default Runtime<\/strong><\/td><td class=\"has-text-align-left\" data-align=\"left\"><strong>Exclusive Access<\/strong><\/td><\/tr><tr><td>wc_cpu<\/td><td>CPU workers<br><br>2.6 GHz Intel E5-2650v2 \u201cIvy Bridge\u201d, 16 cores\/node, 8GB\/core memory, ~280GB local scratch disk, inter-node QDR (40Gbps) Infiniband<\/td><td>90<\/td><td>08:00:00<\/td><td class=\"has-text-align-left\" data-align=\"left\">Y<\/td><\/tr><tr><td>wc_cpu_test<\/td><td>CPU workers<br><br>2.6 GHz Intel E5-2650v2 \u201cIvy Bridge\u201d, 16 cores\/node, 8GB\/core memory, ~280GB local scratch disk, inter-node QDR (40Gbps) Infiniband<\/td><td>7<\/td><td>01:00:00<\/td><td class=\"has-text-align-left\" data-align=\"left\">Y<\/td><\/tr><tr><td>wc_gpu<\/td><td>GPU workers <br><br>Several types of GPUs such as V100, P100, A100. More details in Table 4 below<\/td><td>7<\/td><td>08:00:00<\/td><td class=\"has-text-align-left\" data-align=\"left\">N<\/td><\/tr><tr><td>wc_gpu_ppc<\/td><td>IBM Power9 with GPUs (V100)<\/td><td>1<\/td><td>08:00:00<\/td><td class=\"has-text-align-left\" data-align=\"left\">N<\/td><\/tr><\/tbody><\/table><figcaption class=\"wp-element-caption\">Table 3 &#8211; Wilson cluster Slurm partition overview<\/figcaption><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>The desired partition is selected by specifying the <code>--partition<\/code> flag in a batch submission. On Wilson, the CPU workers are scheduled for exclusive use by a single job. The default is that the GPU workers permit shared use by mutiple jobs \/ users. The option <code>--exclusive<\/code> can be used to request that a job be given exclusive use of the worker nodes.<\/p>\n\n\n\n<p><strong>Slurm job dispatch and priority<\/strong><\/p>\n\n\n\n<p>Slurm on Wilson cluster primarily uses QoS to manage partition access and job priorities. All users submit their jobs to be run by Slurm on a particular resource within one of several partitions. We do not use any form of preemption on our cluster. The resource constraints we have in place are to ensure that multiple projects or accounts can be active on the cluster at any given time and to ensure fair share of available resources.<\/p>\n\n\n\n<p>To see the list of jobs currently in the queue by partition, visit our&nbsp;<a href=\"https:\/\/landscape.fnal.gov\/hpc\/d\/-Hj8eccSz\/wilson-cluster-status?orgId=1\">cluster status<\/a> page. Click on the &#8220;Start Time&#8221; column header to sort the table by start time. Don&#8217;t be alarmed if you see dates from 1969 for idle jobs. This just means Slurm hasn&#8217;t gotten to those jobs yet and monitoring is showing the default. <\/p>\n\n\n\n<p>For running jobs, &#8220;Start Time&#8221; is the actual time that the jobs started. Following that are the pending jobs in the predicted order they may start. You can also click on the &#8220;Submit Time&#8221; column header to see which jobs have been waiting the longest. There are filters in the top right corner to select partitions and users.<\/p>\n\n\n\n<p>From a command line, Slurm&#8217;s &#8216;<code>squeue<\/code>&#8216; command lists the jobs that are queued. It includes running jobs as well as those waiting to be started, aka dispatched. By changing the format of the commands output, one can get a lot of information about several things, such as:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Start time &#8211; actual or predicted<\/li>\n\n\n\n<li>QoS the job is running under<\/li>\n\n\n\n<li>Reason that the job is pending<\/li>\n\n\n\n<li>Calculated dispatch real-time priority of the job<\/li>\n<\/ul>\n\n\n\n<p>The following command should return a sorted (by priority) list of your jobs:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ squeue --sort=P,-p --user=$USER<\/code><\/pre>\n\n\n\n<p>The following command uses the <code>--Format<\/code> option of <code>squeue<\/code> to provide even more details. You can tweak the decimal point values to adjust column width. The Slurm <code>squeue<\/code> manual page <a href=\"https:\/\/slurm.schedmd.com\/squeue.html\" data-type=\"link\" data-id=\"https:\/\/slurm.schedmd.com\/squeue.html\">here<\/a> has details on the options for <code>--Format<\/code>.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ squeue --sort=P,-p --Format=Account:.10,UserName:.10,JobID:.8,Name:.12,PriorityLong:.10,State:.5,QOS:.8,SubmitTime:.20,StartTime:.20,TimeLimit:.11,Reason:.15 --user=$USER<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\">Specifying the number and type of GPUs in a job<\/h4>\n\n\n\n<p>All GPU worker nodes are in the&nbsp;<code>wc_gpu<\/code>&nbsp;partition except the IBM Power9 worker which is in the&nbsp;<code>wc_gpu_ppc<\/code>&nbsp;partition. The WC has GPU worker nodes with different generations of NVIDIA GPUs. The &#8220;Slurm spec&#8221; in the table below is how how you tell Slurm the GPU type you want. If you do not specify a type, you job will run with first available GPU of any type.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><strong>nodes<\/strong><\/td><td><strong>GPU type<\/strong><\/td><td><strong>CUDA cores \/ GPU<\/strong><\/td><td><strong>device memory [GB]<\/strong><\/td><td><strong>Slurm spec<\/strong><\/td><td><strong>GPUs \/ node<\/strong><\/td><td><strong>CPU cores \/ GPU<\/strong><\/td><td><strong>Memory\/ GPU<\/strong> <strong>(GB)<\/strong><\/td><\/tr><tr><td>2<\/td><td>A100<\/td><td>6912<\/td><td>80<\/td><td>a100<\/td><td>2<\/td><td>32<\/td><td>256<\/td><\/tr><tr><td>1<\/td><td>A100<\/td><td>6912<\/td><td>80<\/td><td>a100<\/td><td>4<\/td><td>16<\/td><td>126<\/td><\/tr><tr><td>4<\/td><td>V100<\/td><td>5120<\/td><td>32<\/td><td>v100<\/td><td>2<\/td><td>20<\/td><td>92<\/td><\/tr><tr><td>1<\/td><td>P100<\/td><td>3584<\/td><td>16<\/td><td>p100<\/td><td>8<\/td><td>2<\/td><td>92<\/td><\/tr><tr><td>1<\/td><td>P100<\/td><td>3584<\/td><td>16<\/td><td>p100nvlink<\/td><td>2<\/td><td>14<\/td><td>500<\/td><\/tr><tr><td>1<\/td><td>V100<\/td><td>5120<\/td><td>32<\/td><td>v100nvlinkppc64<\/td><td>4<\/td><td>8 (32 threads)<\/td><td>250<\/td><\/tr><\/tbody><\/table><figcaption class=\"wp-element-caption\">Table 4 &#8211; Wilson cluster GPU overview<\/figcaption><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>A GPU job can request more than one GPU to enable parallel GPU use by your code. Slurm, however, will not permit mixing different Slurm specifications within a job, e.g., a job cannot request eight V100 plus eight P100 devices. GPUs on a worker are partitioned by jobs up to the limit of the maximum number of GPUs in a node. Each job is assigned exclusive of its GPUs, an individual GPU is never shared by different jobs.<\/p>\n\n\n\n<p>Slurm manages GPUs as generic resource via the&nbsp;<code>--gres<\/code>&nbsp;flag to the commands&nbsp;<code>sbatch<\/code>,&nbsp;<code>salloc<\/code>, or&nbsp;<code>srun<\/code>. The table below shows examples of how to choose GPUs in a batch submission.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><strong>Slurm options<\/strong><\/td><td><strong>Description<\/strong><\/td><\/tr><tr><td><code>--gres=gpu:1<\/code><\/td><td>one GPU per node of any type<\/td><\/tr><tr><td><code>--gres=gpu:p100:1<\/code><\/td><td>one P100 GPU per node<\/td><\/tr><tr><td><code>--gres=gpu:v100:2<\/code><\/td><td>two V100 GPUs per node &#8212; the max on the Intel V100 workers<\/td><\/tr><tr><td><code>--gres=gpu:v100:2 --nodes=2<\/code><\/td><td>total of four V100 GPUs, requires two worker nodes<\/td><\/tr><\/tbody><\/table><figcaption class=\"wp-element-caption\">Table 5 &#8211; Wilson cluster GPU access flags overview<\/figcaption><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>For the latter two examples, the submission request should include the <code>--exclusive<\/code> option since the request asks for all GPUs on the worker nodes.<\/p>\n\n\n\n<p>The batch system is configured to portion out CPU and RAM to each batch job. GPU jobs are assigned default values&nbsp;<code>--cpus-per-gpu=2<\/code>&nbsp;and&nbsp;<code>--mem-per-gpu=30G<\/code>. Jobs may request more than the default values. Users should override the defaults using the suggested <code>Cores\/GPU<\/code>&nbsp;and&nbsp;<code>Mem\/GPU<\/code>&nbsp;values in the above table.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Interactive jobs<\/h4>\n\n\n\n<h5 class=\"wp-block-heading\">Interactive job on CPU-only workers<\/h5>\n\n\n\n<p>The command below starts an interactive batch job.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The job uses a single worker node of the <code>wc_cpu<\/code> (CPU-only) partition. <\/li>\n\n\n\n<li>The job is run under the Slurm account <code>myAccount<\/code> at <code>regular<\/code> QoS. <\/li>\n\n\n\n<li>The job requests a time limit of 50 minutes. <\/li>\n\n\n\n<li>The job requests one task and 16 CPUs per task. Specifying CPUs per task is important for threaded applications. If the single task runs 16 threads, they will be able to use all 16 cores on this worker. Other combinations of nodes, tasks and cores per task are acceptable. <\/li>\n\n\n\n<li>Since CPU-only workers are not shared among jobs, available compute resources are usually maximized by choosing <code>ntasks * cpus-per-task = nodes * SLURM_CPUS_ON_NODE<\/code>.<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code>$ cd \/work1_or_wclustre\/your_projects_directory\/\n$ srun --unbuffered --pty --partition=wc_cpu --time=00:50:00 \\\n       --account=myAccount  --qos=regular \\\n       --nodes=1 --ntasks=1 --cpus-per-task=16 \/bin\/bash\n$ hostname\nwcwn001.fnal.gov\n$ env | grep -e TASKS -e CPUS -e CPU_BIND_\nSLURM_CPU_BIND_VERBOSE=quiet\nSLURM_CPUS_PER_TASK=16\nSLURM_TASKS_PER_NODE=1\nSLURM_STEP_TASKS_PER_NODE=1\nSLURM_NTASKS=1\nSLURM_JOB_CPUS_PER_NODE=16\nSLURM_CPUS_ON_NODE=16\nSLURM_CPU_BIND_LIST=0xFFFF\nSLURM_CPU_BIND_TYPE=mask_cpu:\nSLURM_STEP_NUM_TASKS=1\n$ exit  # end the batch job\n$ hostname\nwc.fnal.gov<\/code><\/pre>\n\n\n\n<h5 class=\"wp-block-heading\">Interactive job on a GPU node<\/h5>\n\n\n\n<p>The command which follows request a one-hour interactive job on a GPU worker. In this example, we specify a single GPU device of type V100 using the syntax from Table 5. The request also specifies the number of CPU cores and system RAM per GPU from Table 4 for the V100 worker nodes. Specifying no more than the maximum values of cores and memory per GPU listed in Table 4 will fairly share these resources among jobs that may share the worker node.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ srun --unbuffered --pty  --partition=wc_gpu --time=01:00:00 \\\n       --account=myAccount --qos=regular \\\n       --nodes=1 --ntasks=1 \\\n       --gres=gpu:v100:1 --cpus-per-gpu=20 --mem-per-gpu=92G \\\n       \/bin\/bash\n$ hostname\nwcgpu04.fnal.gov\n$ nvidia-smi --list-gpus\nGPU 0: Tesla V100-PCIE-32GB\n$ env | grep -e TASKS -e CPUS -e CPU_BIND_ -e GPU_\nSLURM_CPU_BIND_VERBOSE=quiet\nSLURM_TASKS_PER_NODE=1\nSLURM_STEP_TASKS_PER_NODE=1\nSLURM_NTASKS_PER_NODE=1\nSLURM_CPUS_PER_GPU=20\nSLURM_NTASKS=1\nSLURM_JOB_CPUS_PER_NODE=20\nSLURM_CPUS_ON_NODE=20\nSLURM_CPU_BIND_LIST=0x003FF003FF\nSLURM_CPU_BIND_TYPE=mask_cpu:\nSLURM_STEP_NUM_TASKS=1\nGPU_DEVICE_ORDINAL=0<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\">Batch jobs and scripts<\/h4>\n\n\n\n<p>Slurm can run a sequence of shell commands specified within a batch script. Typically many such scripts specify a set of parameters for slurm at the top of the script file. These parameters are prefaced by #SBATCH at the beginning of the lines.<\/p>\n\n\n\n<h5 class=\"wp-block-heading\">Running a simple CPU batch script<\/h5>\n\n\n\n<p>In this example we request a single CPU-only worker node and request four tasks and four CPU cores per task. The product equals the sixteen cores on these workers. This combination of tasks and cores per task is typical when running a total four MPI ranks where each MPI rank will run four threads.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ cat batch_cpu.sh\n#! \/bin\/bash\n#SBATCH --account=myAccount\n#SBATCH --qos=test\n#SBATCH --time=00:15:00\n#SBATCH --partition=wc_cpu\n#SBATCH --nodes=1\n#SBATCH --ntasks=4\n#SBATCH --cpus-per-task=4\n#SBATCH --job-name=cpu_test\n#SBATCH --mail-type=NONE\n#SBATCH --output=job_%x.o%A\n#SBATCH --no-requeue\n\nhostname\n\nenv | grep -e TASKS -e CPUS -e CPU_BIND_\n<\/code><\/pre>\n\n\n\n<p>We submit the batch job with the command below<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ sbatch batch_cpu.sh<\/code><\/pre>\n\n\n\n<p>This example produced a batch output fine named <code>job_cpu_test.o527837<\/code> in the directory where the job was submitted. The ouput is shown below.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ cat job_cpu_test.o527837\nwcwn026.fnal.gov\nSLURM_CPUS_PER_TASK=4\nSLURM_TASKS_PER_NODE=4\nSLURM_NTASKS=4\nSLURM_JOB_CPUS_PER_NODE=16\nSLURM_CPUS_ON_NODE=16<\/code><\/pre>\n\n\n\n<h5 class=\"wp-block-heading\">Running a simple GPU batch script<\/h5>\n\n\n\n<p>In this example, we will run a batch script called <code>batch_gpu.sh<\/code>. The cat command that follows displays the content of this file. Note that this job requests a single V100 GPU a &#8220;fair share&#8221; of CPU cores and host memory per GPU on the shared GPU worker.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ cat batch_gpu.sh\n#! \/bin\/bash\n#SBATCH --account=myAccount\n#SBATCH --qos=test\n#SBATCH --time=00:15:00\n#SBATCH --partition=wc_gpu\n#SBATCH --nodes=1\n#SBATCH --ntasks=1\n#SBATCH --gres=gpu:v100:1\n#SBATCH --cpus-per-gpu=20\n#SBATCH --mem-per-gpu=92G\n#SBATCH --job-name=gpu_test\n#SBATCH --mail-type=NONE\n#SBATCH --output=job_%x.o%A\n#SBATCH --no-requeue\n\nhostname\n\nnvidia-smi --list-gpus\n\nenv | grep -e TASKS -e CPUS -e CPU_BIND_ -e GPU_<\/code><\/pre>\n\n\n\n<p>We submit this script to slurm<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ sbatch batch_gpu.sh<\/code><\/pre>\n\n\n\n<p>The job will be placed in the batch queue and upon job completion the directory where the job was submitted contains a file named <code>job_gpu_test.o527838<\/code>. Below we display the result form this job.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ cat job_gpu_test.o527838\nwcgpu04.fnal.gov\nGPU 0: Tesla V100-PCIE-32GB (UUID: GPU-dca4cd4b-03e3-1021-36d2-916a1e59be96)\nSLURM_TASKS_PER_NODE=1\nSLURM_CPUS_PER_GPU=20\nSLURM_NTASKS=1\nSLURM_JOB_CPUS_PER_NODE=20\nSLURM_CPUS_ON_NODE=20\nGPU_DEVICE_ORDINAL=0<\/code><\/pre>\n\n\n\n<h5 class=\"wp-block-heading\">Running MPI application under slurm <\/h5>\n\n\n\n<p>Please refer to the&nbsp;XXXXX&nbsp;for instructions on running MPI in batch jobs.<\/p>\n\n\n\n<p>There&#8217;s a good description of MPI process affinity binding and&nbsp;<code>srun<\/code>&nbsp;here: <a href=\"https:\/\/doc.zih.tu-dresden.de\/jobs_and_resources\/binding_and_distribution_of_tasks\/\" target=\"_blank\" rel=\"noreferrer noopener\">task binding and distribution<\/a>.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Developer access to GPU workers<\/h4>\n\n\n\n<p>Several of A100, V100, and P100 equipped GPU workers are reserved for use by GPU developers during weekday business hours (09:00-17:00 M-F). This reservation is intended to provide developers rapid access to GPUs for testing without having to wait for long batch jobs to finish. Both interactive and batch job access is permitted on the reserved nodes. Jobs must include the parameter <code>--reservation=gpudevtest<\/code> to access the reserved nodes. The reservation is only available to members of a special GPU developer project and the batch request must include <code>--project=scd_devs<\/code>. Users wishing to be added to the developer group must complete the <a href=\"https:\/\/fermi.servicenowservices.com\/nav_to.do?uri=%2Fcom.glideapp.servicecatalog_cat_item_view.do%3Fv%3D1%26sysparm_id%3D51e8caeddb5c1c10a5d674131f9619b8\">user request<\/a> form asking to be added to project <code>scd_devs<\/code>. Please provide a detailed justification for you request in the reason field of the request form. Approval is subject to review of your justification.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Slurm Environment variables<\/h4>\n\n\n\n<p>The table below list some of the commonly used environment variables. A full list is found in the Slurm documentation for <a href=\"https:\/\/slurm.schedmd.com\/sbatch.html\">sbatch<\/a>.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><strong>Variable Name<\/strong><\/td><td><strong>Description<\/strong><\/td><td><strong>Example Value<\/strong><\/td><td><strong>PBS\/Torque analog<\/strong><\/td><\/tr><tr><td>$SLURM_JOB_ID<\/td><td>Job ID<\/td><td>5741192<\/td><td>$PBS_JOBID<\/td><\/tr><tr><td>$SLURM_JOB_NAME<\/td><td>Job Name<\/td><td>myjob<\/td><td>$PBS_JOBNAME<\/td><\/tr><tr><td>$SLURM_SUBMIT_DIR<\/td><td>Submit Directory<\/td><td>\/work1\/user<\/td><td>$PBS_O_WORKDIR<\/td><\/tr><tr><td>$SLURM_JOB_NODELIST<\/td><td>Nodes assigned to job<\/td><td>wcwn[001-005]<\/td><td>cat $PBS_NODEFILE<\/td><\/tr><tr><td>$SLURM_SUBMIT_HOST<\/td><td>Host submitted from<\/td><td>wc.fnal.gov<\/td><td>$PBS_O_HOST<\/td><\/tr><tr><td>$SLURM_JOB_NUM_NODES<\/td><td>Number of nodes allocated to job<\/td><td>2<\/td><td>$PBS_NUM_NODES<\/td><\/tr><tr><td>$SLURM_CPUS_ON_NODE<\/td><td>Number of cores\/node<\/td><td>8,3<\/td><td>$PBS_NUM_PPN<\/td><\/tr><tr><td>$SLURM_NTASKS<\/td><td>Total number of cores for job<\/td><td>11<\/td><td>$PBS_NP<\/td><\/tr><tr><td>$SLURM_NODEID<\/td><td>Index to node running on relative to nodes assigned to job<\/td><td>0<\/td><td>$PBS_O_NODENUM<\/td><\/tr><tr><td>$PBS_O_VNODENUM<\/td><td>Index to core running on within node<\/td><td>4<\/td><td>$SLURM_LOCALID<\/td><\/tr><tr><td>$SLURM_PROCID<\/td><td>Index to task relative to job<\/td><td>0<\/td><td>$PBS_O_TASKNUM &#8211; 1<\/td><\/tr><tr><td>$SLURM_ARRAY_TASK_ID<\/td><td>Job Array Index<\/td><td>0<\/td><td>$PBS_ARRAYID<\/td><\/tr><\/tbody><\/table><figcaption class=\"wp-element-caption\">Table 6 &#8211; Slurm environment variables<\/figcaption><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\">Slurm fixed reservation requests<\/h4>\n\n\n\n<p>Projects that require access to Wilson cluster resources within a fixed period of time or before a fixed deadline are able to request ahead a reservation for a designated amount of compute resources. Examples where projects may benefit from having a reservation in place include:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Making Wilson compute resources available to a workshop or a hackathon.<\/li>\n\n\n\n<li>Having high-priority access to Wilson computing to meet a high-priority deadline such as preparation of a scientific paper or conference presentation.<\/li>\n<\/ul>\n\n\n\n<p>Reservations should not be considered unless there is a well-defined need that cannot be met by the standard batch system queuing policies. Reservation requests are carefully reviewed on their scientific and engineering merit and approval is not automatic. A reservation request must be made though a service desk request using this <a rel=\"noreferrer noopener\" href=\"https:\/\/fermi.servicenowservices.com\/nav_to.do?uri=%2Fcom.glideapp.servicecatalog_cat_item_view.do%3Fsysparm_id%3D69cd40d76fddd2005232ce026e3ee41e%26amp;sysparm_service%3D2be3e7b86fe2d600d6efce026e3ee47e%26amp;sysparm_affiliation%3D\" data-type=\"URL\" data-id=\"https:\/\/fermi.servicenowservices.com\/nav_to.do?uri=%2Fcom.glideapp.servicecatalog_cat_item_view.do%3Fsysparm_id%3D69cd40d76fddd2005232ce026e3ee41e%26amp;sysparm_service%3D2be3e7b86fe2d600d6efce026e3ee47e%26amp;sysparm_affiliation%3D\" target=\"_blank\">link<\/a>. Requests should be made at least two business days in advance of the reservation start date. Your request must contain at least the following<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Name of your Wilson cluster computing project<\/li>\n\n\n\n<li>Supervisor name(s)<\/li>\n\n\n\n<li>Name of the event requiring a Slurm reservation<\/li>\n\n\n\n<li>Type of event, e.g., workshop, presentation, paper publication<\/li>\n\n\n\n<li>Date and time ranges of the Slurm reservation<\/li>\n\n\n\n<li>Type (cpu or gpu) and number of workers to be reserved<\/li>\n\n\n\n<li>Justification for a special batch reservation. In particular, why do the normal batch policies not meet your needs.<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><strong>Additional useful information<\/strong><\/td><\/tr><tr><td><a href=\"http:\/\/slurm.schedmd.com\/pdfs\/summary.pdf\">A quick two page summary of SLURM Commands<\/a><br><a href=\"https:\/\/slurm.schedmd.com\/quickstart.html\">Quick Start SLURM User Guide<\/a><br><a href=\"https:\/\/slurm.schedmd.com\/rosetta.pdf\">Comparison between SLURM and other popular batch schedulers<\/a><br><a href=\"https:\/\/slurm.schedmd.com\/\">Official SLURM documentation<\/a><\/td><\/tr><\/tbody><\/table><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>Slurm workload manager, formerly known as Simple Linux Utility For Resource Management (SLURM), is an open source, fault-tolerant, and highly scalable resource manager and job scheduling system of high availability currently developed by&nbsp;SchedMD. Initially developed for large Linux Clusters at the Lawrence Livermore National Laboratory, Slurm is used extensively on most Top 500 supercomputers around&#8230; <a class=\"more-link\" href=\"https:\/\/computing.fnal.gov\/wilsoncluster\/slurm-job-scheduler\/\"> More &#187;<\/a><\/p>\n","protected":false},"author":15,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-456","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/computing.fnal.gov\/wilsoncluster\/wp-json\/wp\/v2\/pages\/456","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/computing.fnal.gov\/wilsoncluster\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/computing.fnal.gov\/wilsoncluster\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/computing.fnal.gov\/wilsoncluster\/wp-json\/wp\/v2\/users\/15"}],"replies":[{"embeddable":true,"href":"https:\/\/computing.fnal.gov\/wilsoncluster\/wp-json\/wp\/v2\/comments?post=456"}],"version-history":[{"count":121,"href":"https:\/\/computing.fnal.gov\/wilsoncluster\/wp-json\/wp\/v2\/pages\/456\/revisions"}],"predecessor-version":[{"id":7670,"href":"https:\/\/computing.fnal.gov\/wilsoncluster\/wp-json\/wp\/v2\/pages\/456\/revisions\/7670"}],"wp:attachment":[{"href":"https:\/\/computing.fnal.gov\/wilsoncluster\/wp-json\/wp\/v2\/media?parent=456"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}