2

I am running python scripts on a computing cluster (slurm) with two stages and they are sequential. I wrote two python scripts, one for Stage 1 and another for Stage 2. Every morning I check if all Stage 1 jobs are completed visually. Only then, I start Stage 2.

Is there a more elegant/automated way by combining all stages and job management in a single python script? How can I tell if the job has completed?

The workflow is similar to the following:

while not job_list.all_complete():
    for job in job_list:
        if job.empty():
            job.submit_stage1()

        if job.complete_stage1():
            job.submit_stage2()

    sleep(60)
Simon
  • 703
  • 2
  • 8
  • 19
  • What is the output of these stages? How do you know when a stage is done? – Nick Chapman Mar 28 '19 at 18:24
  • I try to make it as general as possible. Is there any way to skip the "end file"? – Simon Mar 28 '19 at 19:11
  • I mean, you could have your jobs return a value when they're done. But that's hard in a distributed system. – Nick Chapman Mar 28 '19 at 19:12
  • I suppose it is also necessary to keep the python running in the background to moniter the jobs until all of them are done. So maybe the pseudo code is a good direction to go. But what if the job went wrong? How could I indicate a rerun? – Simon Mar 28 '19 at 19:18
  • see here https://stackoverflow.com/questions/26890312/how-to-design-a-distributed-job-scheduler – Nick Chapman Mar 28 '19 at 19:50

2 Answers2

1

You have several courses of action:

damienfrancois
  • 52,978
  • 9
  • 96
  • 110
0

You haven't given a lot to go off of for how to determine if a job is finished, but a common way to solve this problem is to have the jobs create a sentinel file that you can look for, something like COMPLETE.

To do this you just add something like

# At the end of stage 1,
job_num = 1234
open('/shared/file/system/or/server/JOB_{job_num}/COMPLETE', 'x').close()

And then you just poll every once in a while to see if you have a COMPLETE file for all of the jobs before starting stage 2.

Nick Chapman
  • 4,402
  • 1
  • 27
  • 41