Get SLURM job ID from job started by strigger

Question

I have an R analysis composed of three parts (partA, partB, and partC). I submit each part to SLURM (e.g. sbatch partA), and each part is parallelized via #SBATCH --array=1-1500. The parts are in serial, so I need to wait for one to finish before starting the next. Right now I'm manually starting each job, but that's not a great solution.

I would like to automate the three sbatch calls. For example:

sbatch partA
when partA is done, sbatch partB
when partB is done, sbatch partC

I used this solution to get the job ID of partA, and pass that to strigger to accomplish step 2 above. However I'm stuck at that point, because I don't know how to get the job ID of partB from strigger. Here's what my code looks like:

#!/bin/bash

# step 1: sbatch partA
partA_ID=$(sbatch --parsable partA.sh)

# step 2: sbatch partB
strigger --set --jobid=$partA_ID --fini --program=/path/to/partB.batch

# step 3: sbatch partC
... ?

How do I complete step 3?

damienfrancois · Accepted Answer · 2018-04-26T21:30:00.130

strigger is not the proper tool to achieve that goal, it is more aimed at administrators than regular users. Only slurm user can actually set triggers (see the "Important note" in the strigger manpage).

In your case, you should submit all three jobs at once, with dependencies set among them.

For instance:

$ partA_ID=$(sbatch --parsable partA.sh)
$ partB_ID=$(sbatch --parsable --dependency=afterany:${partA_ID} partB.sh)
$ partC_ID=$(sbatch --parsable --dependency=afterany:${partB_ID} partC.sh)

This will submit three job arrays but the second one will only start when all jobs in the first one have finished. And the third one will only start when all jobs in the second one have finished.

An alternative can be

$ partA_ID=$(sbatch --parsable partA.sh)
$ partB_ID=$(sbatch --parsable --dependency=aftercorr:${partA_ID}  partB.sh)
$ partC_ID=$(sbatch --parsable --dependency=aftercorr:${partB_ID}  partC.sh)

This will submit three job arrays, but the all jobs of the second one will not start until the corresponding job in the first one (i.e. job that has the same $SLURM_ARRAY_TASK_ID) has finished. And all jobs in the third one will start only when the corresponding job in the second one have finished.

For more details, see the --dependency section in the sbatch manpage.

Are `partB.sh` and `partC.sh` missing from the second and third lines? e.g. Should it be `$ partB_ID=$(sbatch --parsable --dependency=afterany:${partA_ID} partB.sh)`? I didn't explicitly mention them in my post, so that might be the confusion. — R Greg Stacey, Apr 26 '18 at 20:52
Awesome. I'll leave this open for now in case anyone wants to add anything, but I think you answered my question! Super appreciated :) — R Greg Stacey, Apr 26 '18 at 21:04
@RGregStacey you are correct about the missing submission script names, I updated my answer. — damienfrancois, Apr 26 '18 at 21:14

Get SLURM job ID from job started by strigger

1 Answers1