Append output of multiple subprocesses to array in Python

Question

I'm working on a script in Python which open multiple subprocesses in this way:

for file in os.listdir(FOLDER):
    subprocess.Popen(([myprocess]))

Now this processes could be 10-20 running in parallel, and each of them will output in the console a single string line. What I want to do is to append these outputs (no matter in which order) to an array, and when all the processes are done, continue with the script doing other stuff.

I've no idea how to append each output to the array, I was thinking that to check if all subprocesses are done I could do something like this:

outputs = []
k = len(os.listdir(FOLDER))
if len(outputs) == k 
 print "All processes are done!"

UPDATE! This code seems to work now:

pids=set()
outputs = []
for file in os.listdir(FOLDER):
    p = subprocess.Popen(([args]), stdout=subprocess.PIPE)
    pids.add(p.pid)
while pids:
    pid,retval=os.wait()
    output = p.stdout.read()
    outputs.append(output)
    print('{p} finished'.format(p=pid))
    pids.remove(pid)

print "Done!"
print outputs

The problem is that outputs look like this

>> Done!
>> ['OUTPUT1', '', '', '', '', '', '', '', '', '']

Only the first value is filled, the others are left empty, why?

You can use Queue, put each output in queue and then get them all in the array — Aleksander Monk, Jun 06 '15 at 16:52
don't put solutions (answers) into your question. Post it as an answer instead (to allow voting, commenting) btw, your solution is incorrect. — jfs, Jun 08 '15 at 18:52

jfs · Answer 1 · 2015-06-09T10:36:56.190

What I want to do is to append these outputs (no matter in which order) to an array, and when all the processes are done, continue with the script doing other stuff.

#!/usr/bin/env python
import os
from subprocess import Popen, PIPE

# start processes (run in parallel)
processes = [Popen(['command', os.path.join(FOLDER, filename)], stdout=PIPE)
             for filename in os.listdir(FOLDER)]
# collect output
lines = [p.communicate()[0] for p in processes]

To limit the number of concurrent processes, you could use a thread pool:

#!/usr/bin/env python
import os
from multiprocessing.dummy import Pool, Lock
from subprocess import Popen, PIPE

def run(filename, lock=Lock()):
    with lock: # avoid various multithreading bugs related to subprocess
        p = Popen(['command', os.path.join(FOLDER, filename)], stdout=PIPE)
    return p.communicate()[0]

# no more than 20 concurrent calls
lines = Pool(20).map(run, os.listdir(FOLDER))

The latter code example can also read from several child processes concurrently while the former essentially serializes the execution after the corresponding stdout OS pipe buffers are full.

score 0 · Accepted Answer · edited May 23 '17 at 12:05

0

You could wait till all of them finish their job, and then aggregate their standard outputs. To see how it's done, see this answer which covers implementation in-depth.

If you need to do it asynchronously, you should spawn a new thread for this job, and do the waiting in that thread.

If you need to get notified about the results in real time, you could spawn a thread for each of the process separately, wait for them in each of these threads, then after they're done update your list.

To read the output from the process, you can use subprocess.PIPE like presented in this answer.

Edit here is a full sample that worked for me:

#!/usr/bin/python2
import os
import random
import subprocess
outputs = []
processes = []
for i in range(4):
    args = ['bash', '-c', 'sleep ' + str(random.randint(0, 3)) + '; whoami']
    p = subprocess.Popen(args, stdout=subprocess.PIPE)
    processes.append(p)
while processes:
    p = processes[0]
    p.wait()
    output = p.stdout.read()
    outputs.append(output)
    print('{p} finished'.format(p=p.pid))
    os.sys.stdout.flush()
    processes.remove(p)
print outputs

edited May 23 '17 at 12:05

Community

1
1

answered Jun 06 '15 at 16:52

rr-

14,303
6
45
67

The example provided works perfectly! But it doens't tell anything on how to read the output, I should change the line `print('{p} finished'.format(p=pid))` with `outputs.append(output)` but how to read it? – Hyperion Jun 06 '15 at 17:32
It works but only for the first output, the others are left empty, any idea why? I've updated the question – Hyperion Jun 06 '15 at 18:13
Let me know if my modified example worked for you. (I don't know how `os.wait()` is supposed to act, but it looked weird, so I replaced it with `Popen.wait()`.) – rr- Jun 06 '15 at 18:29
it is unlikely in this case. But in general, this solution leads to a deadlock. – jfs Jun 08 '15 at 18:53
@J.F.Sebastian Can you elaborate? – rr- Jun 08 '15 at 18:54
Try a subprocess that generates output that is larger than the corresponding OS pipe buffer (65KB on my machine). – jfs Jun 08 '15 at 19:07
@rr-: think: where the output from children goes while the parent is blocked on `p.wait()`. **try it** e.g., `python -c "for c in 'abc': print(c*10**5)"` – jfs Jun 09 '15 at 10:34
Why a `while` loop? A `for` loop over `processes` would be more readable IMHO. – BlackJack Jun 09 '15 at 14:37
Because I was working with OP's code. If you believe my answer is wrong, you can edit it. – rr- Jun 09 '15 at 15:38

Append output of multiple subprocesses to array in Python

2 Answers2