How to reuse intermediate results of Popen in Python?

Question

The codes are like this:

from subprocess import Popen, PIPE


p1 = Popen("command1", stdout = PIPE)
p2 = Popen("command2", stdin = p1.stdout, stdout = PIPE)
result_a = p2.communicate()[0]

p1_again = Popen("command1", stdout = PIPE)
p3 = Popen("command3", stdin = p1_again.stdout, stdout = PIPE)
result_b = p3.communicate()[0]

with open("test") as tf:
    p1_again_again = Popen("command1", stdout = tf)
    p1_again_again.communicate()

The bad part is:

The command1 was executed three times because when I use commnnicate once, the stdout of that Popen object can't be used again. I was just wondering whether there's a method to reuse the intermediate results of PIPE.

Does anyone have ideas about how to make these codes better (better performance as well as less lines of codes)? Thanks!

You can read the output of p1, and write the output to the input stream of p2. Check this out: http://stackoverflow.com/questions/163542/python-how-do-i-pass-a-string-into-subprocess-popen-using-the-stdin-argument — nhahtdh, Nov 22 '12 at 03:35
i guess you should execute `result_p1 = p1.communicate()[0]` before using another Popen for P2 and pass stdin to p2 as result_p1, this way you will always haev stdout of p1 in result_p1 — avasal, Nov 22 '12 at 03:46

score 3 · Accepted Answer · edited May 23 '17 at 10:24

here is a working solution. I have put example commands for cmd1, cmd2, cmd3 so that you can run it. It just takes the output from the first command and uppercases it in one command and lowercases it in the other.

code

from subprocess import Popen, PIPE, check_output
from tempfile import TemporaryFile

cmd1 = ['echo', 'Hi']
cmd2 = ['tr', '[:lower:]', '[:upper:]']
cmd3 = ['tr', '[:upper:]', '[:lower:]']

with TemporaryFile() as f:
    p = Popen(cmd1, stdout=f)
    ret_code = p.wait()
    f.flush()
    f.seek(0)
    out2 = Popen(cmd2, stdin=f, stdout=PIPE).stdout.read()
    f.seek(0)
    out3 = Popen(cmd3, stdin=f, stdout=PIPE).stdout.read()
    print out2, out3

output

HI
hi

some of the things to make note of in the solution. the tempfile module is always a great way to go when needing to work with temp files, it will automatically delete the temporary file as a cleanup once the with statement exits, even if there was some io exception thrown through out the with block. cmd1 is run once and output to the temp file, one calls the wait() method to make sure all execution has completed, then we do seek(0) each time so that when we call the read() method on f it is back at the start of the file. As a reference the question Saving stdout from subprocess.Popen to file, helped me in getting the first part of the solution.

`f.flush()` in the main process might not affect the file buffer in the child process in any way. You could call `.wait()` for `cmd2`, `cmd3` to avoid zombies. — jfs, Nov 22 '12 at 12:42

score 0 · Answer 2 · edited May 23 '17 at 12:18

If you can read all output of command1 in memory and then run command2, command3 one after another:

#!/usr/bin/env python
from subprocess import Popen, PIPE, check_output as qx

cmd1_output = qx(['ls']) # get all output

# run commands in sequence
results = [Popen(cmd, stdin=PIPE, stdout=PIPE).communicate(cmd1_output)[0]
           for cmd in [['cat'], ['tr', 'a-z', 'A-Z']]]

Or you can write to a temporary file first if command1 generates a gigantic output that can't fit in memory as @Marwan Alsabbagh suggested:

#!/usr/bin/env python
import tempfile
from subprocess import check_call, check_output as qx

with tempfile.TemporaryFile() as file: # deleted automatically on closing
    # run command1, wait for completion
    check_call(['ls'], stdout=file)

    # run commands in sequence
    results = []
    for cmd in [['cat'], ['tr', 'a-z', 'A-Z']]:
        file.seek(0)
        results.append(qx(cmd, stdin=file))

To handle input/output to/from subprocesses in parallel you could use threading:

#!/usr/bin/env python3
from contextlib import ExitStack  # pip install contextlib2 (stdlib since 3.3)
from subprocess import Popen, PIPE
from threading  import Thread

def tee(fin, *files):
    try:
        for chunk in iter(lambda: fin.read(1 << 10), b''):
            for f in files:  # fan out
                f.write(chunk)
    finally:
        for f in (fin,) + files:
            try:
                f.close()
            except OSError:
                pass

with ExitStack() as stack:
    # run commands asynchronously
    source_proc = Popen(["command1", "arg1"], stdout=PIPE)
    stack.callback(source_proc.wait)
    stack.callback(source_proc.stdout.close)

    processes = []
    for command in [["tr", "a-z", "A-Z"], ["cat"]]:
        processes.append(Popen(command, stdin=PIPE, stdout=PIPE))
        stack.callback(processes[-1].wait)
        stack.callback(processes[-1].stdout.close) # use .terminate()
        stack.callback(processes[-1].stdin.close)  # if it doesn't kill it

    fout = open("test.txt", "wb")
    stack.callback(fout.close)

    # fan out source_proc's output
    Thread(target=tee, args=([source_proc.stdout, fout] +
                             [p.stdin for p in processes])).start()

    # collect results in parallel
    results = [[] for _ in range(len(processes))]
    threads = [Thread(target=r.extend, args=[iter(p.stdout.readline, b'')])
               for p, r in zip(processes, results)]
    for t in threads: t.start()
    for t in threads: t.join() # wait for completion

I've used ExitStack here for a proper clean up in case of exceptions.

How to reuse intermediate results of Popen in Python?

2 Answers2