How to generate different scripts to run on each directory in linux?

Question

I have a directory main in which there are around 100 directories. For example it looks like below:

main
 |__ test_1to50000
 |__ test_50001to60000
 |__ test_60001to70000
 |__ test_70001to80000
 |__ test1.sh

I have a sbatch script test1.sh to run on the first directory.

#!/bin/bash

#SBATCH --job-name=sbatchJob   
#SBATCH --cpus-per-task=16       
#SBATCH --mem-per-cpu=8G    
#SBATCH --time=1-00:00:00
#SBATCH --qos=1day
if [ -f ~/.bashrc ] ; then
    . ~/.bashrc
fi

module load Perl/5.28.0-GCCcore-8.2.0

perl path/to/software --cpu 16 --run /path/to/test_1to50000 command /path/to/test_1to50000/software.`date +"%m_%d_%y_%H-%M-%S"`.log

I have 100 directories, so I would like to create each script for each directory and submit the scripts. How to generate sbatch scripts for all the other directories like above?

You may find some help at [Pass command line arguments via sbatch](https://stackoverflow.com/questions/27708656/pass-command-line-arguments-via-sbatch). If worst comes to worst, you can run some variant of `for dir in *; do sed "s%@dir@%$dir%g" template.file > sbatch.$dir; done` to create the scripts from a template file which contains `@dir@` markers where you want to place the (sub)directory name. Using `%` instead of `/` in the `sed` command works better if you need to work with pathnames — as long as you don't use `%` anywhere in your file or directory names. — Jonathan Leffler, Nov 14 '20 at 16:04
The [`sbatch`](https://slurm.schedmd.com/sbatch.html) manual indicates that arguments can be provided to `sbatch`. You should be able to create one file and then invoke that with appropriate arguments: `for dir in *; do sbatch … $dir; done`. I've not investigated the details or caveats — but that would be the normal way Unix (POSIX, Linux, …) systems would work. — Jonathan Leffler, Nov 14 '20 at 16:07
@JonathanLeffler thanks for the comment. Yes, I am able to create scripts for each directory, but inside the scripts, I should also change the directory names and this needs to be done for each script. For example, in the above sbatch script `test1.sh` the command in the end also have path to the directory. So, how to do that? — beginner, Nov 14 '20 at 16:39

damienfrancois · Accepted Answer · 2020-11-15T13:12:54.010

Your best option is to use a job array with a script like this:

#!/bin/bash
#SBATCH --array=0-3   # 3 == number of dirs - 1
#SBATCH --job-name=sbatchJob   
#SBATCH --cpus-per-task=16       
#SBATCH --mem-per-cpu=8G    
#SBATCH --time=1-00:00:00
#SBATCH --qos=1day
if [ -f ~/.bashrc ] ; then
    . ~/.bashrc
fi

module load Perl/5.28.0-GCCcore-8.2.0
DIRS=(main/*/)    # This array will hold all directories
CURRDIR="${DIRS[$SLURM_ARRAY_TASK_ID]}" # This is the directory taken care of by the current job

perl path/to/software --cpu 16 --run "$CURRDIR" command "$CURRDIR"/software.`date +"%m_%d_%y_%H-%M-%S"`.log

This will create a job array with one job per directory. You will need to setup the correct amount of jobs in the array to correspond to the number of directories. But then, with the array, you can manage all the jobs with a single command, get a single email when all jobs are finished, and it eases the work of the scheduler a lot.

How to generate different scripts to run on each directory in linux?

1 Answers1