How to get the ID of GPU allocated to a SLURM job on a multiple GPUs node?

Question

When I submit a SLURM job with the option --gres=gpu:1 to a node with two GPUs, how can I get the ID of the GPU which is allocated for the job? Is there an environment variable for this purpose? The GPUs I'm using are all nvidia GPUs. Thanks.

score 6 · Accepted Answer · answered May 14 '17 at 19:37

6

You can get the GPU id with the environment variable CUDA_VISIBLE_DEVICES. This variable is a comma separated list of the GPU ids assigned to the job.

answered May 14 '17 at 19:37

Carles Fenoy

4,740
1
26
27

It works. Thanks. It seems that the environment variable GPU_DEVICE_ORDINAL also works. – Negelis May 14 '17 at 20:40
6

This doesn't identify the GPU uniquely when using cgroups. With cgroups, CUDA_VISIBLE_DEVICES would be 0 for all GPUs because each process only sees a single GPU (others are hidden by the cgroup). – isarandi Jun 12 '19 at 15:44

bryant1410 · Answer 2 · 2021-01-13T20:18:44.797

5

You can check the environment variables SLURM_STEP_GPUS or SLURM_JOB_GPUS for a given node:

echo ${SLURM_STEP_GPUS:-$SLURM_JOB_GPUS}

Note CUDA_VISIBLE_DEVICES may not correspond to the real value (see @isarandi's comment).

Also, note this should work for non-Nvidia GPUs as well.

edited Jan 13 '21 at 20:18

answered Jan 13 '21 at 20:12

bryant1410

5,540
4
39
40

score 3 · Answer 3 · answered Jul 21 '19 at 01:54

Slurm stores this information in an environment variable, SLURM_JOB_GPUS.

One way to keep track of such information is to log all SLURM related variables when running a job, for example (following Kaldi's slurm.pl, which is a great script to wrap Slurm jobs) by including the following command within the script run by sbatch:

set | grep SLURM | while read line; do echo "# $line"; done

How to get the ID of GPU allocated to a SLURM job on a multiple GPUs node?

3 Answers3

Linked