2

For example, I have a script called myScript that processes one input file, and I have a list of filenames. That is, I need to run

$ myScript <filename>

for a list of filenames stored in a filenames.txt.

The only way to distribute my jobs I found in Slurm is to specify -n parameter, which duplicates your command or batch script. But notice that I need to pass in a variable parameter into each run. Is it possible to do so in Slurm?

My current solution is to fire a lot of sbatch scripts, each with a specified <filename>. However, this way squeue will show a whole lot of my jobs and I'm afraid this is frown upon by other users.

xzhu
  • 5,675
  • 4
  • 32
  • 52

1 Answers1

4

One option is using job arrays. Prepare a two-lines submission (untested) script like this:

#! /bin/bash
#SBATCH --array=1-<number of lines in filenames.txt>
myScript "$(tail -n+$SLURM_ARRAY_TASK_ID filenames.txt | head -n1)"

and submit it with sbatch mySubmissionScript.sh. It will create a job array with one job per line in the file, running myScript on the file written at line SLURM_ARRAY_TASK_ID. You will just need to replace <number of lines in filenames.txt> with the actual number of lines in the files as given by wc -l filenames.txt for instance.

Job arrays are shown in a compressed form in the output of Slurm squeue command, with all pending jobs displayed as one line only. You can also limit the number of simultaneously running jobs with

--array=1-16%4

Slurm will then only allow 4 jobs from that array at a time.

damienfrancois
  • 52,978
  • 9
  • 96
  • 110