Ubuntu HPC Slurm Resource Management Setup

From Notes_Wiki

Home > Ubuntu > Ubuntu HPC setup with slurm and linux containers > Ubuntu HPC Slurm Resource Management Setup

Slurm Resource Management Setup

For setting up accounting and database support, refer to: Ubuntu HPC Slurm DB setup for Slurm accounting

This section outlines only the resource management setup in Slurm using accounts, QOS, partitions, and user-level resource limitations.

1. Create and Register the Cluster

Run the following on the Slurm master node:

sacctmgr -i add cluster test_hpc-cluster

Note: If the cluster already exists, the command will safely skip.

To verify:

sacctmgr list cluster

This will show the cluster name that we have added in last command

2. Create Account

Create an account associated with the cluster:

sacctmgr add account account1 Cluster=iitdh-hpc

When prompted with:

Would you like to commit changes?

Type `y` and press Enter.

3. Create QOS Levels

List existing QOS:

sacctmgr show qos format=name,priority,GrpTRES

Create new QOS levels:

sacctmgr add qos example1
sacctmgr add qos example2

Verify:

sacctmgr show qos format=name,priority,GrpTRES

4. Update slurm.conf for Resource Management

Edit your shared `slurm.conf` file:

vim /export/tmp/slurm/slurm.conf

Ensure the following lines are present or updated:

AccountingStorageEnforce=associations,limits,qos,safe
GresTypes=gpu
AccountingStorageTRES=gres/gpu

Then copy this config to all nodes and restart services:

Master:

cp /export/tmp/slurm/slurm.conf /etc/slurm/slurm.conf
systemctl restart slurmctld

Compute Nodes:

cp /export/tmp/slurm/slurm.conf /etc/slurm/slurm.conf
systemctl restart slurmd

Login Nodes:

cp /export/tmp/slurm/slurm.conf /etc/slurm/slurm.conf

5. Add User to Account and QOS

sacctmgr add user test_user Accounts=account1 Partitions=Test_partition1 QOSLevel=example1

Verify:

sacctmgr show assoc format=cluster,user,qos

Expected output:

Cluster              User       QOS
--------            --------   --------------
test_hpc-cluster    test_user   example1

Configure GPU Resources (GRES)

To enable GPU (Graphics Processing Unit) resource tracking and allocation in SLURM, you must configure the `gres.conf` file on all nodes that have GPU devices.

Step 1: Identify GPU Devices and Minor Numbers

Use the following command to list GPU Bus IDs and their minor numbers:

nvidia-smi -q | grep -i -e 'Bus Id' -e Minor

Example output:

Minor Number   : 0
Bus Id         : 00000000:01:00.0
Minor Number   : 1
Bus Id         : 00000000:41:00.0
Minor Number   : 2
Bus Id         : 00000000:81:00.0
Minor Number   : 3
Bus Id         : 00000000:C1:00.0

Each GPU is exposed to SLURM via a device file at `/dev/nvidiaX`, where `X` is the minor number.

Step 2: Create the `gres.conf` File

Create or edit the file `/etc/slurm/gres.conf` on all relevant nodes using the following format:

NodeName=<node_name> Name=gpu Type=<gpu_model> File=/dev/nvidia<minor_number>

Example `gres.conf` file:

# Node1 has 2x P100 GPUs
NodeName=node1 Name=gpu Type=p100 File=/dev/nvidia0
NodeName=node1 Name=gpu Type=p100 File=/dev/nvidia1

# Node2 has 1x V100 GPU
NodeName=node2 Name=gpu Type=v100 File=/dev/nvidia0

# Node3 has 4x A100 GPUs
NodeName=node3 Name=gpu Type=a100 File=/dev/nvidia0
NodeName=node3 Name=gpu Type=a100 File=/dev/nvidia1
NodeName=node3 Name=gpu Type=a100 File=/dev/nvidia2
NodeName=node3 Name=gpu Type=a100 File=/dev/nvidia3

# Node4 has 4x A100 GPUs
NodeName=node4 Name=gpu Type=a100 File=/dev/nvidia0
NodeName=node4 Name=gpu Type=a100 File=/dev/nvidia1
NodeName=node4 Name=gpu Type=a100 File=/dev/nvidia2
NodeName=node4 Name=gpu Type=a100 File=/dev/nvidia3

Save this file as `/export/tmp/slurm/gres.conf` for central access and copy this to all the compute and master nodes including the login node.

Step 3: Deploy `gres.conf` to All Nodes

Copy the `gres.conf` file to all compute and controller nodes:

sudo cp /export/tmp/slurm/gres.conf /etc/slurm/gres.conf

Restart SLURM services:

# On compute nodes:
sudo systemctl restart slurmd

# On master/controller node:
sudo systemctl restart slurmctld

Step 4: Verify GPU Resources in SLURM

Use the following command to ensure SLURM sees the GPU resources correctly:

sinfo --format="%P %n %f %G" --Node

Example output:

PARTITION HOSTNAMES AVAIL_FEATURES GRES
gpu-p100  node1     (null)         gpu:p100:2
gpu*      node1     (null)         gpu:p100:2
cpu       node1     (null)         gpu:p100:2
gpu-v100  node2     (null)         gpu:v100:1
gpu*      node2     (null)         gpu:v100:1
cpu       node2     (null)         gpu:v100:1
gpu*      node3     (null)         gpu:a100:4
cpu       node3     (null)         gpu:a100:4
gpu-a100  node3     (null)         gpu:a100:4
gpu*      node4     (null)         gpu:a100:4
cpu       node4     (null)         gpu:a100:4
gpu-a100  node4     (null)         gpu:a100:4


Step 5: Test GPU Resource Selection with SLURM

You can test if GPU selection is working correctly using:

srun --nodelist=node1 --gres=gpu:p100:2 --pty bash

Inside the session:

echo $CUDA_VISIBLE_DEVICES

Expected output:

0,1

This indicates that SLURM correctly allocated two GPUs for the job.

6. Apply Per-User Resource Limits using QOS

Limit CPU and GPU usage per user:

sacctmgr modify qos example1 set MaxTRESPerUser=cpu=2,gres/gpu=1

Limit maximum wall time:

sacctmgr modify qos example1 set MaxWall=12:00:00

Set maximum number of jobs per account:

sacctmgr modify qos example1 set MaxJobsPA=10

Refer to official documentation: https://slurm.schedmd.com/resource_limits.html

7. Test the QOS and Limits

Try submitting a job exceeding the limit:

srun --gres=gpu:2 --cpus-per-task=1 --account=account1 -p Test_partition1 --pty bash

Expected message (queued due to limit):

srun: job 216 queued and waiting for resources

8. Manage Node States

Check node status:

sinfo -l

If any node is `DOWN` or `DRAINED`, bring it back up:

scontrol update NodeName=<node-name> state=idle

Home > Ubuntu > Ubuntu HPC setup with slurm and linux containers > Ubuntu HPC Slurm Resource Management Setup