0

I am trying to create a variable batch size, to make a subprocess call. I am a bit confused on the best way to have a batch of 5 kick off, wait for all 5 to complete than kick off the next five.

What I have so far is:

batchSize = 5
proccessArray = process.split(",")
processLength = len(proccessArray) - 1
counter1 = 0
for i in range(0, processLength, batchSize):
    for x in range(1, batchSize):
        d = {}
        if counter1 < processLength:
            dgclOutput = inputPath + st + "_" + (i + x) + "output"
            d["process{0}".format(x)] = subprocess.Popen(proccessArray(i + x) + ">>" + dgclOutput, shell=True, stdout=subprocess.PIPE)
            counter1 + 1
        else:
            break

BatchSize is my number of batches I which to go at a time. Process Array is a listing of commands it needs to call. Process length is the amount of possible commands. Counter is to kick out of the loop when it reaches the max.

So my first loop steps in the amount of the batch size, than an inner loop that creates 5 subpoccesses in a dictionary to kick off.

This code does not work, anyone have an idea how to make it work or a better solution?

theMadKing
  • 2,064
  • 7
  • 32
  • 59

3 Answers3

1

I think you're probably looking for something along these lines:

batchSize = 5
processArray = process.split(",")
for i in xrange(0, len(processArray), batchSize):
    batch = processArray[i:i+batchSize]  # len(batch) <= batchSize
    ps = []
    for process in batch:
        output = "..."
        p = subprocess.Popen(process + ">>" + output, shell=True, stdout=subprocess.PIPE)
        ps.append(p)
    for p in ps:
        p.wait()
Myk Willis
  • 12,306
  • 4
  • 45
  • 62
0

You want to do something like this. Say you have a list, commands, of all the commands you want to run.

def chunks(l, n):
    """Yield successive n-sized chunks from l."""
    for i in xrange(0, len(l), n):
        yield l[i:i+n]

for next_batch in chunks(commands, 5):
    # Start the next few subprocesses
    subps = [subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)
             for cmd in next_batch]
    # Wait for these to finish
    for subp in subps:
        subp.wait()

(Chunks function taken from this answer.)

Community
  • 1
  • 1
Claudiu
  • 224,032
  • 165
  • 485
  • 680
  • don't use `stdout=PIPE` unless you read from the pipe. OP probably wants an analog of `stdout=open(dgclOutput, 'a')` here. Also, `shell=True` should be discouraged. Otherwise, `chunks()` is the correct approach here. – jfs Sep 16 '15 at 00:25
0

You need .communicate() or .wait() functions of subprocess module to wait for the processes to finish. Alternatively, you can use .poll() to see if a subprocess has finished.

batchSize = 5
proccessArray = process.split(",")
processLength = len(proccessArray) - 1
counter1 = batchSize
for i in range(0, processLength, batchSize):
    d = {}
    for x in range(1, batchSize):
        dgclOutput = inputPath + st + "_" + (i + x) + "output"
        d["process{0}".format(x)] = subprocess.Popen(proccessArray(i + x) + ">>" + dgclOutput, shell=True, stdout=subprocess.PIPE)
    while not counter1:
        for p in d:
            if not p.poll():
               counter1 -= 1

There is a better example here: Python subprocess in parallel

Community
  • 1
  • 1
ilke444
  • 2,641
  • 1
  • 17
  • 31