2

There is a program (which I cannot modify) that creates two output files. I'm trying to write a Python wrapper around this program that streams the two outputs of this program through pipes and interleaves the two outputs into a single output, 4 lines at a time.

I have implemented a working wrapper in Bash. It's got a lot of nested process substitutions, named pipes, and such.

mkfifo out1
mkfifo out2

./someProgram out1 out2 &
paste <(paste - - - - < out1) <(paste - - - - < out2) | tr '\t' '\n'

wait
rm out1 out2

I'm now trying to translate this into Python.

# `named_pipes()` function as defined in http://stackoverflow.com/a/28840955/459780
with named_pipes(4) as paths:
    data1, data2, paste1, paste2 = paths
    proc1 = Popen(['./someProgram', data1, data2])

    with open(data1, 'r') as stream1, open(data2, 'r') as stream2, \
            open(paste1, 'w') as stream3, open(paste2, 'w') as stream4:
        pastecmd = ['paste', '-', '-', '-', '-']
        proc2 = Popen(pastecmd, stdin=stream1, stdout=stream3)
        proc3 = Popen(pastecmd, stdin=stream2, stdout=stream4)
        proc4 = Popen(['paste', paste1, paste2], stdout=PIPE)
        proc5 = Popen(['tr', "'\t'", "'\n'"], stdin=proc4.stdout)

        proc5.communicate()
        proc1.wait()

It's deadlocking, probably because I'm not calling the communicate() and wait() functions correctly. What am I doing wrong?


EDIT: The following dummy script behaves similarly to the program of interest. Save it as someProgram if you actually want to run the Bash and Python code above.

#!/usr/bin/env python
from __future__ import print_function
import sys
with open(sys.argv[1], 'w') as f1, open(sys.argv[2], 'w') as f2:
    for i in range(1000):
        print('@read{}/1'.format(i), 'ACGT'*25, '+', 'BBBB'*25, sep='\n', file=f1)
        print('@read{}/2'.format(i), 'ACGT'*25, '+', 'BBBB'*25, sep='\n', file=f2)
Daniel Standage
  • 8,136
  • 19
  • 69
  • 116
  • `someProgram` processes short sequences of DNA represented as strings. – Daniel Standage Jun 09 '16 at 00:43
  • @DanielStandage Are you sure it is terminating in a timely manner? – Natecat Jun 09 '16 at 00:52
  • @Natecat When I run the shell wrapper, it executes and finishes within seconds. The Python wrapper hangs indefinitely. I edited the question to provide a dummy script for testing. – Daniel Standage Jun 09 '16 at 01:49
  • This has nothing to do with communicate or wait, your code never gets passed the `with open...` – Padraic Cunningham Jun 09 '16 at 10:41
  • I've updated [my answer to your related question, to produce the desired output: it shows how to print 4 lines from one then 4 lines from another file, etc](http://stackoverflow.com/a/37686462/4279) – jfs Jun 10 '16 at 16:17
  • 1
    if you have a working bash script why do you need to translate it to Python? (just run the bash script directly). – jfs Jun 10 '16 at 16:22
  • @J.F. Sebastian Fair question. Some security and testing concerns, but mostly want to integrate with a related Python codebase. – Daniel Standage Jun 15 '16 at 17:32

0 Answers0