-1

I submit jobs to a cluster (high-performance computer) using file1.sh and file2.sh.

The content of file1.sh is

qsub job1.sh
qsub job2.sh
qsub job3.sh
...
qsub job999.sh
qsub job1000.sh

The content of file2.sh is

qsub job1001.sh
qsub job1002.sh
qsub job1003.sh
...
qsub job1999.sh
qsub job2000.sh

After typing ./file1.sh in putty, job1 to job1000 are submitted.

Is there an automatic way to type ./file2.sh ONLY after job1000 has completed? Please note, I want to type ./file2.sh automatically only after job1000 has finished (not just successfully submitted).

The reason for doing this, is that we can only submit 1000 jobs at a time. This 1000 limit includes the jobs at running and at the queue. The use of -hold_jid will still be considered within the limit of 1000. So I have to wait for all the first 1000 jobs finished (not simply submitted) then I am able to submit the next 1000 jobs.

lanselibai
  • 1,203
  • 2
  • 19
  • 35
  • Possible duplicate of [How to make the bash script work with one command after another?](https://stackoverflow.com/q/49629366/608639), [Execute command after every command in bash](https://stackoverflow.com/q/45123034/608639), [Running multiple commands in one line in shell](https://stackoverflow.com/q/5130847/608639), [Run one command after another, even if I suspend the first one (Ctrl-z)](https://stackoverflow.com/q/13600319/608639), etc. – jww Jul 23 '19 at 06:01
  • What scheduler does your cluster use? Certainly there are scheduler specific options for your problem. – Fex Jul 23 '19 at 06:57
  • @jww not really, I want the previous jobs finished, not just submitted. – lanselibai Jul 23 '19 at 20:16
  • @Fex I am from UCL, how do I know which scheduler it uses? Can you help me search? https://wiki.rc.ucl.ac.uk/wiki/Main_Page – lanselibai Jul 23 '19 at 20:17
  • @lanselibai I cannot find the specific software used on your cluster, but proposed an idea how to solve this problem. – Fex Jul 24 '19 at 09:51

1 Answers1

0

Without the limitation to submitting 1000 Jobs, you could name your first jobs. You can then tell the next jobs to wait until the first jobs are finished. But as all jobs will be submitted, I think you will run against your 1000 jobs limit.

qsub -N job1 ./a.sh
qsub -N job2 ./b.sh
qsub -hold_jid job1,job2 -N job3 ./c.sh

You could write a shell script that submits the first 1000 jobs. Then the scripts waits until some jobs have finished and submits the next jobs. The script checks with something like

qstat -u username | wc -l

How many jobs you have submitted. If you have less than 1000 submitted jobs, the script could submit the next x jobs, where x = 1000 - #SubmittedJobs.

Cluster operators usually vary, what user behaviour they tolerate. So maybe it would be better to ask if this is ok for them. Also, some schedulers give jobs of powerusers (here in number of jobs) a lower priority for new jobs. So it could be the case, that your new jobs spend more time in the queue.

Fex
  • 322
  • 1
  • 13
  • The use of `-hold_jid` is still considered as job submitted, i.e. still within the 1000 limit. I edited my question. – lanselibai Jul 24 '19 at 13:09
  • @lanselibai The first part of the answer is the solution how you can do this without a limit on submitted jobs.The second part of the answer describes a solution with a custom script checking your number of submitted jobs, followed by submitting more jobs until you reach again 1000 submitted jobs. – Fex Jul 25 '19 at 07:16
  • I see, sorry, could you show me how to achieve "If you have less than 1000 submitted jobs, the script could submit the next x jobs, where x = 1000 - #SubmittedJobs"? And can this be automatically done? – lanselibai Jul 25 '19 at 16:18