1

I have a script that splits a data structure into chunks. The chunks are processed using a torque job array and then merged back into a single structure.

The merge operation is dependent on the job array completing. How do I make the merge operation wait for the torque job array to complete?

$ qsub --version
Version: 4.1.6

My script is as follows:

# Splits the data structure and processes the chunks
qsub -t 1-100 -l nodes=1:ppn=40,walltime=48:00:00,vmem=120G ./job.sh
# Merges the processed chunks back into a single structure
./merge.sh

I have tried:

qsub -t 1-100 -l nodes=1:ppn=40,walltime=48:00:00,vmem=120G -N job1 ./job.sh
qsub -W depend=afterokarray:job1 ./merge.sh

and also:

qsub -t 1-100 -l nodes=1:ppn=40,walltime=48:00:00,vmem=120G -N job1 ./job.sh
qsub -hold_jid job1 ./merge.sh

Neither worked. The former resulted in error [qsub: illegal -W value], and the latter also resulted in error: qsub: script file 'job1' cannot be loaded - No such file or directory.

Josh
  • 1,357
  • 2
  • 23
  • 45
  • 1
    You are not using the `afterokarray` syntax correctly. It should be -W depend=afterokarray:12345[] where 12345[] is the array job ID that is returned by the preceding `qsub`. See also [here on SO](http://stackoverflow.com/a/18463349/1328439). – Dima Chubarov Oct 23 '13 at 02:13
  • 1
    @Josh - Did you ever solve the issue with the "'job1' cannot be loaded" error when using the -hold_jid flag? I'm currently trying to implement this same feature and am running into the same error. Thanks! – Ryan G Mar 03 '15 at 18:44

2 Answers2

2

The output of

qsub -t 1-100 -l nodes=1:ppn=40,walltime=48:00:00,vmem=120G -N job1 ./job.sh

contains the jobid. So following should work in bash:

FIRST=`qsub first_1.sh`
qsub -W depend=afterok:$FIRST second_1.sh
Bort
  • 2,423
  • 14
  • 22
  • This got rid of the error relating to the illegal -W value, but the second script (second_1.sh in your example) did not seem to run. – Josh Oct 22 '13 at 13:27
0

The answer

You should user afterokarray:

ID=$(qsub -t 1-100 -l nodes=1:ppn=40,walltime=48:00:00,vmem=120G -N job1 ./job.sh)
qsub -W depend=afterokarray:$(ID) ./merge.sh

Another example

This is another example, let say you need to execute a job two times and after those, execute another one:

#!/bin/bash
#PBS -q batch
#PBS -l walltime=24:00:00
#PBS -o /app/run/KLFLO/nueva_tarea_0.$PBS_ARRAYID.out
#PBS -e /app/run/KLFLO/nueva_tarea_0.$PBS_ARRAYID.err
#PBS -N KLFLO_nueva_tarea_0
#PBS -t 1-2 # times 
sleep 20
/bin/cat /etc/hosts

Execute qsub < nueva_tarea_2.bash an them use (10[].docker) in the other submitssion file

#!/bin/bash
#PBS -q batch
#PBS -l walltime=24:00:00
#PBS -o /app/run/KLFLO/nueva_tarea_2.$PBS_ARRAYID.out
#PBS -e /app/run/KLFLO/nueva_tarea_2.$PBS_ARRAYID.err
#PBS -N KLFLO_nueva_tarea_2
#PBS -t 1-2 # times
#PBS -W depend=afterokarray:10[].docker
/bin/cat /etc/hosts
Carlochess
  • 676
  • 1
  • 8
  • 22