1

I want to start multiple 10 jobs at a time and then wait for them to finish and then start 10 more jobs in parallel in background repeat this till all the 100 jobs are done.

Here is the python code that calls the shell script

from subprocess import call

# other code here.

# This variable is basically # of jobs/batch.
windowsize = 10

# Here is how I call the shell command. I have 100 jobs in total that I want as 10 batches with 10 jobs/batch.
for i in range (0..100) :
   numjobs = i + windowsize

   # Start 10 jobs in parallel at a time 
   for j in range (i..numjobs) :
       call (["./myscript.sh", "/usr/share/file1.txt", ""/usr/share/file2.txt"],   shell=True)

   # Hoping that to wait until the 10 jobs that were recently started in background finish.
   call(["wait],shell=True)

In my shell script I have this

#!/bin/sh

# I start the job in background. Each job takes few minutes to finish.

shell command $1 $2 &
...

Unfortunately, all 100 jobs are started and not 10 batches with 10 jobs/batch.

glenn jackman
  • 238,783
  • 38
  • 220
  • 352
user3803714
  • 5,269
  • 10
  • 42
  • 61
  • Do you want to maintain 10 jobs (no more, no less) or do you want to wait until the whole batch ends before starting a new one? `//` is not a comment in Python. Fix the source code formatting in your question. – jfs Feb 19 '15 at 02:17
  • maintain 10 jobs (no more, no less) is preferred, but the 2nd option wait until the whole batch ends before starting a new one is also acceptable. I have fixed the comments ;). – user3803714 Feb 19 '15 at 02:25
  • 1
    You could use a thread pool. See [Python threading multiple bash subprocesses?](http://stackoverflow.com/q/14533458/4279) and if you want to capture subprocesses' output: [Python: execute cat subprocess in parallel](http://stackoverflow.com/a/23616229/4279). The question is unreadable, fix the formatting: click the question mark inside the circle while editing and [read the formatting help](http://stackoverflow.com/editing-help). – jfs Feb 19 '15 at 02:31

1 Answers1

0

There is no (straightforward) way to wait for a grandchild process. Add wait at the end of myscript.sh script instead.

To limit the number of concurrently running subprocesses, you could use a thread pool:

#!/usr/bin/env python
import logging
from multiprocessing.pool import ThreadPool
from subprocess import call

windowsize = 10
cmd = ["./myscript.sh", "/usr/share/file1.txt", "/usr/share/file2.txt"]

def run(i):
    return i, call(cmd)

logging.basicConfig(format="%(asctime)-15s %(message)s", datefmt="%F %T",
                    level=logging.INFO)
pool = ThreadPool(windowsize)
for i, rc in pool.imap_unordered(run, range(100)):
    logging.info('%s-th command returns %s', i, rc)

NOTE: shell=True is dropped.

jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • Unfortunately, this does not work. I have a print statement in the run method. Even that does not get printed. My script of course is not executed. – user3803714 Feb 19 '15 at 07:29
  • It does work (I've tested it). Do not modify the code, copy-paste it as is. Make sure `myscript.sh` works, run `check_call(['./myscript.sh', 'a','b'])` once in the main thread to make sure. – jfs Feb 19 '15 at 07:37
  • I have python 2.6.6. I copy pasted the code. I have a changed myscript.sh to a script that just does is echo $1 $2. I used the above code and it just hangs. Not sure what is going on. – user3803714 Feb 19 '15 at 19:09
  • @user3803714: There is probably a bug in Python 2.6 because it works fine on Python 2.7/3. I've tweaked the code to work on Python 2.6 too. – jfs Feb 20 '15 at 00:59