1

I am trying to set a constraint so that my job would only run on GPUs with compute capability higher (or equal) to 7.

Here is my script named torch_gpu_sanity_venv385-11.slurm:

#!/bin/bash
#SBATCH --partition=gpu-L --gres=gpu:1 --constraint="cc7.0" 
# -------------------------> ask for 1 GPU
d=$(date)
h=$(hostname)
echo $d $h env         # show CUDA related Env vars 
env|grep -i cuda
# nvidia-smi
#          actual work
/research/jalal/slurm/fashion/fashion_compatibility/torch_gpu_sanity_venv385-11.bash 

Without using --constraint="cc7.0" my script runs correctly. I even used another version that is --constraint=cc7.0 but in either case I get the following error:

[jalal@goku fashion_compatibility]$ sbatch torch_gpu_sanity_venv385-11.slurm 
sbatch: error: Batch job submission failed: Invalid feature specification

When I remove the --constraint="cc7.0" term, I am able to run the job. after removing the constraint term:

[jalal@goku fashion_compatibility]$ sbatch torch_gpu_sanity_venv385-11.slurm 
Submitted batch job 28398

So, how can I set the constraint so that I am only assigned GPUs with compute capability of 7 or higher?

I followed this tutorial for constraint setting.

Mona Jalal
  • 34,860
  • 64
  • 239
  • 408
  • 1
    The tutorial you are following are instructions for a specific cluster where the administrators have added node specific features denoting compute capability of the GPUs in each node in the cluster so that they can be added as constraints. Your cluster has clearly not been configured in the same fashion. What you are seeking to use is not a built in feature of SLURM, it is something that needs to be added to the cluster configuration – talonmies Sep 15 '21 at 03:55
  • @talonmies Thanks a lot for your reply. This was very helpful. Indeed, I am able to set it to 12G GPU memory (but not this). – Mona Jalal Sep 17 '21 at 18:59

0 Answers0