Best way to fork multiple shell commands/processes in Python?

Question

Most of the examples I've seen with os.fork and the subprocess/multiprocessing modules show how to fork a new instance of the calling python script or a chunk of python code. What would be the best way to spawn a set of arbitrary shell command concurrently?

I suppose, I could just use subprocess.call or one of the Popen commands and pipe the output to a file, which I believe will return immediately, at least to the caller. I know this is not that hard to do, I'm just trying to figure out the simplest, most Pythonic way to do it.

Thanks in advance

score 4 · Answer 1 · answered Nov 11 '11 at 05:39

4

All calls to subprocess.Popen return immediately to the caller. It's the calls to wait and communicate which block. So all you need to do is spin up a number of processes using subprocess.Popen (set stdin to /dev/null for safety), and then one by one call communicate until they're all complete.

Naturally I'm assuming you're just trying to start a bunch of unrelated (i.e. not piped together) commands.

answered Nov 11 '11 at 05:39

Chris Eberle

47,994
12
82
119

Yes, I think that's what I need. I am trying to do just that. I want to fork the process(es), let them do their thing and come back later on, after running a number of other commands, to grep their output. Thanks! – mrmbd Nov 11 '11 at 15:57

glglgl · Answer 2 · 2011-11-14T08:18:58.510

I suppose, I could just us subprocess.call or one of the Popen commands and pipe the output to a file, which I believe will return immediately, at least to the caller.

That's not a good way to do it if you want to process the data.

In this case, better do

sp = subprocess.Popen(['ls', '-l'], stdout=subprocess.PIPE)

and then sp.communicate() or read directly from sp.stdout.read().

If the data shall be processed in the calling program at a later time, there are two ways to go:

You can retrieve the data ASAP, maybe via a separate thread, reading them and storing them somewhere where the consumer can get them.
You can have the producing subprocess have block and retrieve the data from it when you need them. The subprocess produces as many data as fit in the pipe buffer (usually 64 kiB) and then blocks on further writes. As soon as you need the data, you read() from the subprocess object's stdout (maybe stderr as well) and use them - or, again, you use sp.communicate() at that later time.

Way 1 would the way to go if producing the data needs much time, so that your wprogram would have to wait.

Way 2 would be to be preferred if the size of the data is quite huge and/or the data is produced so fast that buffering would make no sense.

I should have added that I do want to process the data but not right away i.e. it may be a long running process, a server of some sort and after I fork it, my script will be off doing other things. When I come back to the forked process, I might just grep its output file for errors. . . — mrmbd, Nov 11 '11 at 15:54

score 1 · Answer 3 · answered Nov 11 '11 at 07:37

I like to use PTYs instead of pipes. For a bunch of processes where I only want to capture error messages I did this.

RNULL = open('/dev/null', 'r')
WNULL = open('/dev/null', 'w')
logfile = open("myprocess.log", "a", 1)
REALSTDERR = sys.stderr
sys.stderr = logfile

This next part was in a loop spawning about 30 processes.

sys.stderr = REALSTDERR
master, slave = pty.openpty()
self.subp = Popen(self.parsed, shell=False, stdin=RNULL, stdout=WNULL, stderr=slave)
sys.stderr = logfile

After this I had a select loop which collected any error messages and sent them to the single log file. Using PTYs meant that I never had to worry about partial lines getting mixed up because the line discipline provides simple framing.

jfs · Answer 4 · 2011-11-11T21:18:36.153

There is no best for all possible circumstances. The best depends on the problem at hand.

Here's how to spawn a process and save its output to a file combining stdout/stderr:

import subprocess
import sys

def spawn(cmd, output_file):
    on_posix = 'posix' in sys.builtin_module_names
    return subprocess.Popen(cmd, close_fds=on_posix, bufsize=-1,
                            stdin=open(os.devnull,'rb'),
                            stdout=output_file,
                            stderr=subprocess.STDOUT)

To spawn multiple processes that can run in parallel with your script and each other:

processes, files = [], []
try:
    for i, cmd in enumerate(commands):
        files.append(open('out%d' % i, 'wb'))
        processes.append(spawn(cmd, files[-1]))
finally:
    for p in processes:
        p.wait()
    for f in files: 
        f.close()

Note: cmd is a list everywhere.

"There is no best for all possible circumstances." - I think that's true most of the time, I guess I just want some recommendations. — mrmbd, Nov 11 '11 at 15:58
@mrmbd: I've meant to say that the question is too broad for such a complex topic. — jfs, Nov 11 '11 at 21:24

score 0 · Answer 5 · edited May 23 '17 at 10:28

See an older answer of mine including code snippets to do:

Uses processes not threads for blocking I/O because they can more reliably be p.terminated()
Implements a retriggerable timeout watchdog that restarts counting whenever some output happens
Implements a long-term timeout watchdog to limit overall runtime
Can feed in stdin (although I only need to feed in one-time short strings)
Can capture stdout/stderr in the usual Popen means (Only stdout is coded, and stderr redirected to stdout; but can easily be separated)
It's almost realtime because it only checks every 0.2 seconds for output. But you could decrease this or remove the waiting interval easily
Lots of debugging printouts still enabled to see whats happening when.

For spawning multiple concurrent commands, you would need to alter the class RunCmd to instantiate mutliple read output/write input queues and to spawn mutliple Popen subprocesses.

Best way to fork multiple shell commands/processes in Python?

5 Answers5