There is a program (which I cannot modify) that creates two output files. I'm trying to write a Python wrapper around this program that streams the two outputs of this program through pipes and interleaves the two outputs into a single output, 4 lines at a time.
I have implemented a working wrapper in Bash. It's got a lot of nested process substitutions, named pipes, and such.
mkfifo out1
mkfifo out2
./someProgram out1 out2 &
paste <(paste - - - - < out1) <(paste - - - - < out2) | tr '\t' '\n'
wait
rm out1 out2
I'm now trying to translate this into Python.
# `named_pipes()` function as defined in http://stackoverflow.com/a/28840955/459780
with named_pipes(4) as paths:
data1, data2, paste1, paste2 = paths
proc1 = Popen(['./someProgram', data1, data2])
with open(data1, 'r') as stream1, open(data2, 'r') as stream2, \
open(paste1, 'w') as stream3, open(paste2, 'w') as stream4:
pastecmd = ['paste', '-', '-', '-', '-']
proc2 = Popen(pastecmd, stdin=stream1, stdout=stream3)
proc3 = Popen(pastecmd, stdin=stream2, stdout=stream4)
proc4 = Popen(['paste', paste1, paste2], stdout=PIPE)
proc5 = Popen(['tr', "'\t'", "'\n'"], stdin=proc4.stdout)
proc5.communicate()
proc1.wait()
It's deadlocking, probably because I'm not calling the communicate()
and wait()
functions correctly. What am I doing wrong?
EDIT: The following dummy script behaves similarly to the program of interest. Save it as someProgram
if you actually want to run the Bash and Python code above.
#!/usr/bin/env python
from __future__ import print_function
import sys
with open(sys.argv[1], 'w') as f1, open(sys.argv[2], 'w') as f2:
for i in range(1000):
print('@read{}/1'.format(i), 'ACGT'*25, '+', 'BBBB'*25, sep='\n', file=f1)
print('@read{}/2'.format(i), 'ACGT'*25, '+', 'BBBB'*25, sep='\n', file=f2)