1

I have a slurm script to run my python code:

#!/bin/bash -l                                                                                                    
#SBATCH --nodes=1                                                                                                 
#SBATCH --ntasks=1                                                                                                
#SBATCH --cpus-per-task=1                                                                                         
#SBATCH --mem=10G                                                                                                 
#SBATCH --account=my_account                                                                                 
#SBATCH --qos=default                                                                                           
#SBATCH --time=2-00:00:00                                                                                         
###Array setup here                                                                                               
#SBATCH --array=1                                                                                                 
#SBATCH --open-mode=truncate                                                                                      
#SBATCH --output=out_files/output.o                                                                              

module purge
module load my_cluster
module load Miniconda3/4.9.2

eval "$(${EBROOTMINICONDA3}/bin/conda shell.bash hook)"

conda activate my_conda_env

cd /my_directory

python my_python_code.py -filename file_a.txt

This works, but at the moment, it just launches 1 job and uses file_a.txt as an argument.

How can I launch 10 simultaneous jobs? I know I can use:

#SBATCH --array=1-10  

but I want to use file_a.txt as the argument for job 1, file_b.txt as the argument for job 2 etc..

I would like to provide the lists of file names as a separate text file if possible, which is read by the slurm script.

user1551817
  • 6,693
  • 22
  • 72
  • 109

1 Answers1

2

As per the docs, the SLURM_ARRAY_TASK_ID environment variable will be set to the (1-indexed) task ID. We can use this env var with sed to get the Nth line from a list of files

my_files.txt

file_a.txt
file_b.txt
file_c.txt

Credit to this answer for the sed -n "xp" command.

my_slurm_job.sh

#!/bin/bash -l                                                                                                    
#SBATCH --nodes=1                                                                                                 
#SBATCH --ntasks=1                                                                                                
#SBATCH --cpus-per-task=1                                                                                         
#SBATCH --mem=10G                                                                                                 
#SBATCH --account=my_account                                                                                 
#SBATCH --qos=default                                                                                           
#SBATCH --time=2-00:00:00                                                                                         
###Array setup here                                                                                               
#SBATCH --array=1                                                                                                 
#SBATCH --open-mode=truncate                                                                                      
#SBATCH --output=out_files/%a_output.o                                                                              

module purge
module load my_cluster
module load Miniconda3/4.9.2

eval "$(${EBROOTMINICONDA3}/bin/conda shell.bash hook)"

conda activate my_conda_env

cd /my_directory

# Get the Nth line from my_files.txt
file_name=$(sed -n "${SLURM_ARRAY_TASK_ID}p" < my_files.txt)

python my_python_code.py -filename ${file_name}

Edited to add the Task ID to the output file name as per FlyingTeller's comment and the Slurm docs.

user1551817
  • 6,693
  • 22
  • 72
  • 109
FiddleStix
  • 3,016
  • 20
  • 21
  • 1
    :-) feel free to improve the answer if I've missed anything out. – FiddleStix Jul 26 '23 at 14:18
  • 3
    Might add that it can be a good idea to add `%a` to the output file , so `#SBATCH --output=out_files/output.%a.o` which creates one output file per array task – FlyingTeller Jul 26 '23 at 14:21
  • Thanks! So firstly, I think I need to add a space after `file_name=` otherwise I get an error message about `-n: command not found`. But then if I have a space I don't think anything is assigned to `file_name`. – user1551817 Jul 26 '23 at 15:02
  • 1
    I believe it was missing `$(...)` around the sed part - I edited the answer and it works now, thank you! – user1551817 Jul 26 '23 at 15:39