How can I make my Slurm script loop over a list of file names?

Question

I have a slurm script to run my python code:

#!/bin/bash -l                                                                                                    
#SBATCH --nodes=1                                                                                                 
#SBATCH --ntasks=1                                                                                                
#SBATCH --cpus-per-task=1                                                                                         
#SBATCH --mem=10G                                                                                                 
#SBATCH --account=my_account                                                                                 
#SBATCH --qos=default                                                                                           
#SBATCH --time=2-00:00:00                                                                                         
###Array setup here                                                                                               
#SBATCH --array=1                                                                                                 
#SBATCH --open-mode=truncate                                                                                      
#SBATCH --output=out_files/output.o                                                                              

module purge
module load my_cluster
module load Miniconda3/4.9.2

eval "$(${EBROOTMINICONDA3}/bin/conda shell.bash hook)"

conda activate my_conda_env

cd /my_directory

python my_python_code.py -filename file_a.txt

This works, but at the moment, it just launches 1 job and uses file_a.txt as an argument.

How can I launch 10 simultaneous jobs? I know I can use:

#SBATCH --array=1-10

but I want to use file_a.txt as the argument for job 1, file_b.txt as the argument for job 2 etc..

I would like to provide the lists of file names as a separate text file if possible, which is read by the slurm script.

score 2 · Accepted Answer · edited Jul 26 '23 at 15:38

As per the docs, the SLURM_ARRAY_TASK_ID environment variable will be set to the (1-indexed) task ID. We can use this env var with sed to get the Nth line from a list of files

my_files.txt

file_a.txt
file_b.txt
file_c.txt

Credit to this answer for the sed -n "xp" command.

my_slurm_job.sh

#!/bin/bash -l                                                                                                    
#SBATCH --nodes=1                                                                                                 
#SBATCH --ntasks=1                                                                                                
#SBATCH --cpus-per-task=1                                                                                         
#SBATCH --mem=10G                                                                                                 
#SBATCH --account=my_account                                                                                 
#SBATCH --qos=default                                                                                           
#SBATCH --time=2-00:00:00                                                                                         
###Array setup here                                                                                               
#SBATCH --array=1                                                                                                 
#SBATCH --open-mode=truncate                                                                                      
#SBATCH --output=out_files/%a_output.o                                                                              

module purge
module load my_cluster
module load Miniconda3/4.9.2

eval "$(${EBROOTMINICONDA3}/bin/conda shell.bash hook)"

conda activate my_conda_env

cd /my_directory

# Get the Nth line from my_files.txt
file_name=$(sed -n "${SLURM_ARRAY_TASK_ID}p" < my_files.txt)

python my_python_code.py -filename ${file_name}

Edited to add the Task ID to the output file name as per FlyingTeller's comment and the Slurm docs.

:-) feel free to improve the answer if I've missed anything out. — FiddleStix, Jul 26 '23 at 14:18
Might add that it can be a good idea to add `%a` to the output file , so `#SBATCH --output=out_files/output.%a.o` which creates one output file per array task — FlyingTeller, Jul 26 '23 at 14:21
Thanks! So firstly, I think I need to add a space after `file_name=` otherwise I get an error message about `-n: command not found`. But then if I have a space I don't think anything is assigned to `file_name`. — user1551817, Jul 26 '23 at 15:02
I believe it was missing `$(...)` around the sed part - I edited the answer and it works now, thank you! — user1551817, Jul 26 '23 at 15:39

How can I make my Slurm script loop over a list of file names?

1 Answers1