Bash process substitution in Python with subprocess.run() and shared input

Question

I have a question regarding bash process substitution in python using subprocess. I'm trying to write it in a way that both the main function and the subprocess use the same input from stdin (which in the code is a string variable). Here is the code:

p1 = subprocess.run(['cat'],
     stdout=subprocess.PIPE, input=in_fa.encode())
p2 = subprocess.run(['bwa samse reference/C57BL_6J.fa <(bwa aln -l 20 reference/C57BL_6J.fa -) -'],
     shell=True, executable="/bin/bash", input=p1.stdout,
     stdout=subprocess.PIPE)

In this example, in_fa is a string like the following:

 >header\ntTCAGCCTTCCCTTCCATTTCTCTCCCCTTCCCTCTCCTCCCCATTTCAGAGTTTCTTTAGAATCTGTATTCTGGCACCCAAAGTGAACTATGTGTCTGACTCAGGGGCTCTTTGTTTCACTGCAGGGCTGTGGTG

In this code, both '-' in the main process and the subprocess refer to in_fa, but while the main process is reading it correctly, the subprocess is not.

This, for example, would work, but it's not dynamic and it's reading from a file instead than a variable:

p1 = subprocess.run(['''cat fasta/input.fasta |
    bwa samse reference/C57BL_6J.fa <(
        cat fasta/input.fasta |
          bwa aln -l 20 reference/C57BL_6J.fa -) -'],
    shell=True, executable="/bin/bash", stdout=subprocess.PIPE)

Any help would be appreciated! Meanwhile, I will keep trying.

You appear to have a race condition, with two commands in `p2` both trying to read from the same standard input. Also, what is the point of `p1`, instead of using `in_fa.encode()` as the input to `p2` directly? — chepner, Aug 23 '18 at 13:04
I broke up your code for legibility; these long one-liners are disappearing into the right margin on the desktop version of this site, though they look fine on mobile. — tripleee, Aug 23 '18 at 13:17
You are right, using 'cat' and 'p1' was just a very bad practice, totally redundant. I still have many things to fix in my coding! — atorreso, Aug 23 '18 at 13:20

tripleee · Accepted Answer · 2018-08-23T13:36:43.450

You cannot consume standard input from two distinct processes; they need to receive a copy each.

My approach to this would be to write the string to a temporary file and take it from there.

In addition, your subprocess calls have a couple of problems.

You need to pass in either a string or a list of tokens. What you have looks like it's working, but it's really not well-defined.
The cat performs no useful purpose here; the purpose of cat is to combine multiple files, and you only have one file. (It wasn't useful in the shell either.)

import tempfile
import os

with tempfile.TemporaryDirectory() as tmpdirname:
    fa_tmp = os.path.join([tmpdirname, 'in.fa'])
    with open(fa_tmp, 'wb') as handle:
         handle.write(in_fa.encode())
    proc = subprocess.run(
         '''bwa samse reference/C57BL_6J.fa <(
              bwa aln -l 20 reference/C57BL_6J.fa {0})
              {0}'''.format(fa_tmp),
        shell=True, executable="/bin/bash", 
        check=True, stdout=subprocess.PIPE)

See also Running Bash commands in Python where I have an answer which outlines some of the problems you are having in more detail.

Thanks for your answer! I was hoping that two processes could read from the same input and avoid creating a temporary file. I will implement your solution instead. — atorreso, Aug 23 '18 at 13:34
I added `check=True` but this could still fail silently if the process substitution (the stuff in `<(...)`) has an error. — tripleee, Aug 23 '18 at 13:37

Bash process substitution in Python with subprocess.run() and shared input

1 Answers1