0

I am completely new to using SLURM to submit jobs to a HPC and I am facing a peculiar problem that I am not able to resolve.

I have a job.slurm file that contains the following bash script

#!/bin/bash
#SBATCH --job-name singularity-mpi
#SBATCH -N 1 # total number of nodes
#SBATCH --time=00:05:00 # Max execution time
#SBATCH --partition=partition-name
#SBATCH --output=/home/users/r/usrname/slurm-reports/slurm-%j.out

module load GCC/9.3.0 Singularity/3.7.3-Go-1.14 CUDA/11.0.2 OpenMPI/4.0.3

binaryPrecision=600 #Temporary number

while getopts i:o: flag
do
        case "${flag}" in
                i) input=${OPTARG}
                        ;;
                o) output=${OPTARG}
                        ;;
                *) echo "Invalid option: -$flag" ;;
        esac
done

mpirun --allow-run-as-root singularity exec --bind /home/users/r/usrname/scratch/points_and_lines/:/usr/local/share/sdpb/ sdpb_2.5.1.sif pvm2sdp $binaryPrecision /usr/local/share/sdpb/$input /usr/local/share/sdpb/$output                                                    

The command pvm2sdp is just some specific kind of C++ executable that converts a XML file to a JSON file.

If I submit the .slurm file as

sbatch ./job.slurm -i /home/users/r/usrname/scratch/points_and_lines/xmlfile.xml -o /home/users/r/usrname/scratch/points_and_lines/jsonfile.json

it works perfectly. However, if I instead submit it using srun as

srun ./job.slurm -i /home/users/r/usrname/scratch/points_and_lines/xmlfile.xml -o /home/users/r/usrname/scratch/points_and_lines/jsonfile.json

I get the following error -

--------------------------------------------------------------------------
A call to mkdir was unable to create the desired directory:

  Directory: /scratch
  Error:     Read-only file system

Please check to ensure you have adequate permissions to perform
the desired operation.
--------------------------------------------------------------------------

I have no clue why this is happening and how I can go about resolving the issue. I tried to mount /scratch as well but that does not resolve the issue.

Any help would be greatly appreciated since I need to use the srun inside another .slurm file that contains multiple other MPI calls.

  • Use sbatch to run the SLURM script. Use srun in place of mpirun INSIDE the script. – Qubit Jul 03 '22 at 17:04
  • @Qubit just so I understand, you suggest removing mpirun and using srun inside the .slurm file instead? But pvm2sdp is an executable specifically written for MPI and doing that replacement gives `The application appears to have been direct launched using "srun", but OMPI was not built with SLURM's PMI support and therefore cannot execute.` – Bharath Radhakrishnan Jul 03 '22 at 17:36
  • TL;DR `srun ... mpirun ...` is **not** gonna work. – Gilles Gouaillardet Jul 04 '22 at 03:54
  • @xcodeking srun is typically the preferred way of running mpi jobs with SLURM, however as you correctly note it requires some configuration, typically most clusters do have this and I have personally not had issues with this (although there are clusters that further wrap SLURM for one reason or another). Of course there are ways around this, see e.g. the second result on Google (will download a PDF): https://www.hlrn.de/doc/download/attachments/9667587/HLRN%20Anwenderschulung%202020%20-%20srunvsmpirun.pdf?version=1&modificationDate=1604515334919&api=v2 – Qubit Jul 04 '22 at 05:24
  • Unfortunately I suspect none of these will work for your particular error, it may be best to ask the administrator who is familiar with the configuration of the server. They are more likely to be able to assist with errors regarding permissions and/or configuration of libraries such as OMPI. – Qubit Jul 04 '22 at 05:30

1 Answers1

0

I generally use srun after salloc. Let's say I have to run a python file on a GPU. I will use salloc to allocate a compute node.

salloc --nodes=1 --account=sc1901 --partition=accel_ai_mig --gres=gpu:2

Then I use this command to directly access the shell of the compute node.

srun --pty bash

Now, you can type any command as would do on your pc. You can try nvidia-smi. You can run Python files python code.py.

In your case, you can simply load modules manually and then run your mpirun command after srun --pty bash. You don't need the job script.

One more thing, sbatch and srun are customised for each HPC, so we can't say what exactly is stopping you from running those commands.

At Swansea University, we are expected to use job scripts with the sbatch only. Have a look at my university's HPC tutorial.

Read this article to know the primary differences between both.

Prakhar Sharma
  • 543
  • 9
  • 19