Capture multiple files from subprocess stdout to a temp file to be used for the next subprocess

Question

I am trying to run multiple python programs in sequence and I want to capture the stdout or the output of each process in a file or files using tempfile or (if there is a better way to do that will be awesome). The output which will be file or multiple files generated from process 1 will be used as input for the next process. How to do that?

So my code is something like this:

# multiple files
for file in files:
  P1 = Popen('python Prog1.py' + ' ' + file, stdout=PIPE , stderr=STDOUT,shell=True)
  # Now I want to capture the output to files so can be used for the next process
  # It didn't work
  l_out_files.append(P1.stdout)
  # Prog2 require if the input file is more than one to have them separated by space 
P2 = Popen('python Prog2.py' + ' ' + ' '.join( l_out_files), stdout=PIPE , stderr=STDOUT,shell=True)

Thanks a lot guys,

Best

what are you actually trying to pass, are both p1 and p2 in the loop? — Padraic Cunningham, Sep 21 '15 at 20:01
@PadraicCunningham I fixed that, p1 goes inside the loop but bit p2. — BioInformatician, Sep 21 '15 at 20:05
so Prog2.py takes a string of file names? Also why not just do all this from python, what are your programs actually doing? — Padraic Cunningham, Sep 21 '15 at 20:06
@PadraicCunningham yes string of file names from the output generated from p1. — BioInformatician, Sep 21 '15 at 20:09
I added an answer but I am pretty confident you can achieve what you want without using subprocess — Padraic Cunningham, Sep 21 '15 at 20:12
(1) why do you run Python code as an external process instead of just using `import module1, module2` and calling corresponding functions? (2) Can you change `Prog1.py`, `Prog2.py`? (3) Is `Prog1.py`'s output limited? Does it write to stdout or it opens some output file internally? (4) to avoid guesses, provide dummy Prog1.py Prog2.py that generate some data for testing. — jfs, Sep 22 '15 at 11:24

Padraic Cunningham · Answer 1 · 2015-09-21T20:26:48.983

2

As far as your own code goes appending P1.stdout, just appends references to the method to your list so obviously that is not going to work, it would be P1.communicate()[0] to extract the output.

I imagine this could all be done without the need for subprocess but you can create the list of output with check_output, not really sure why you are redirecting stderr to STDOUT either:

from subprocess import check_output,STDOUT

data  = [check_output(['python', 'Prog1.py', file], stderr=STDOUT) for file in files]

out = check_output(['python', 'Prog2.py',' '.join(data)], stderr=STDOUT)

check_output will raise an error for any non-zero exit status which it probably should as passing any error output to your python program would more than likely break it.

To catch any errors you can use a try/except to catch the CalledProcessError:

from subprocess import check_output,STDOUT, CalledProcessError

data = []
for file in files:
    try:
       data.append(check_output(['python', 'Prog1.py'], stderr=STDOUT))
    except CalledProcessError as e:
        print(e.message)

edited Sep 21 '15 at 20:26

answered Sep 21 '15 at 20:11

Padraic Cunningham

176,452
29
245
321

That didn't work. Prog1 display some messages for the user to notify them with the status of the analysis and so on, the messages only captured with check_output but not the actual output which in Prog1 is being written to a file. – BioInformatician Sep 21 '15 at 22:18
How can you catch output written to a file? You need to add what your code is actually doing and exactly what you expect to happen – Padraic Cunningham Sep 21 '15 at 22:20
use `'Prog2.py'] + filenames` instead of `'Prog2.py', ' '.join(data)]` if you think that `Prog1.py` writes filename where it stores its result to stdout – jfs Sep 22 '15 at 12:29

score 0 · Answer 2 · edited May 23 '17 at 12:00

It looks like you want to emulate bash process substitution:

#!/usr/bin/env python
from subprocess import check_call

check_call('prog2 <(prog1 file1) <(prog1 file2) <(prog1 file3) ...',
           shell=True, executable='/bin/bash')

where prog2 runs python Prog2.py, prog1 runs python Prog1.py. It assumes that prog1 writes its result to stdout and prog2 accepts its input filenames on the command-line. Bash process substitution allows to pass output from prog1 to prog2 without writing the data to disk and without accumulating it all in memory (in case, it may be large).

While you can do it in pure Python without the shell (the link shows named pipe-based and fdescfd (/dev/fd/#) -based solutions but you could use other methods if necessary); it would be easier if prog2 would accept its input on stdin:

#!/usr/bin/env python
from subprocess import check_call

check_call('{ prog1 file1; prog1 file2; prog1 file3; ...; } | prog2',
           shell=True)

You can do it in pure Python without the shell:

#!/usr/bin/env python3
from subprocess import Popen, PIPE, check_call

with Popen('prog2', stdin=PIPE) as prog2:
    for filename in files:
        check_call(['prog1', filename], stdout=prog2.stdin)

Capture multiple files from subprocess stdout to a temp file to be used for the next subprocess

2 Answers2