1

I have a question regarding bash process substitution in python using subprocess. I'm trying to write it in a way that both the main function and the subprocess use the same input from stdin (which in the code is a string variable). Here is the code:

p1 = subprocess.run(['cat'],
     stdout=subprocess.PIPE, input=in_fa.encode())
p2 = subprocess.run(['bwa samse reference/C57BL_6J.fa <(bwa aln -l 20 reference/C57BL_6J.fa -) -'],
     shell=True, executable="/bin/bash", input=p1.stdout,
     stdout=subprocess.PIPE)

In this example, in_fa is a string like the following:

 >header\ntTCAGCCTTCCCTTCCATTTCTCTCCCCTTCCCTCTCCTCCCCATTTCAGAGTTTCTTTAGAATCTGTATTCTGGCACCCAAAGTGAACTATGTGTCTGACTCAGGGGCTCTTTGTTTCACTGCAGGGCTGTGGTG

In this code, both '-' in the main process and the subprocess refer to in_fa, but while the main process is reading it correctly, the subprocess is not.

This, for example, would work, but it's not dynamic and it's reading from a file instead than a variable:

p1 = subprocess.run(['''cat fasta/input.fasta |
    bwa samse reference/C57BL_6J.fa <(
        cat fasta/input.fasta |
          bwa aln -l 20 reference/C57BL_6J.fa -) -'],
    shell=True, executable="/bin/bash", stdout=subprocess.PIPE)

Any help would be appreciated! Meanwhile, I will keep trying.

tripleee
  • 175,061
  • 34
  • 275
  • 318
atorreso
  • 15
  • 3
  • You appear to have a race condition, with two commands in `p2` both trying to read from the same standard input. Also, what is the point of `p1`, instead of using `in_fa.encode()` as the input to `p2` directly? – chepner Aug 23 '18 at 13:04
  • I broke up your code for legibility; these long one-liners are disappearing into the right margin on the desktop version of this site, though they look fine on mobile. – tripleee Aug 23 '18 at 13:17
  • You are right, using 'cat' and 'p1' was just a very bad practice, totally redundant. I still have many things to fix in my coding! – atorreso Aug 23 '18 at 13:20

1 Answers1

2

You cannot consume standard input from two distinct processes; they need to receive a copy each.

My approach to this would be to write the string to a temporary file and take it from there.

In addition, your subprocess calls have a couple of problems.

  • You need to pass in either a string or a list of tokens. What you have looks like it's working, but it's really not well-defined.

  • The cat performs no useful purpose here; the purpose of cat is to combine multiple files, and you only have one file. (It wasn't useful in the shell either.)

import tempfile
import os

with tempfile.TemporaryDirectory() as tmpdirname:
    fa_tmp = os.path.join([tmpdirname, 'in.fa'])
    with open(fa_tmp, 'wb') as handle:
         handle.write(in_fa.encode())
    proc = subprocess.run(
         '''bwa samse reference/C57BL_6J.fa <(
              bwa aln -l 20 reference/C57BL_6J.fa {0})
              {0}'''.format(fa_tmp),
        shell=True, executable="/bin/bash", 
        check=True, stdout=subprocess.PIPE)

See also Running Bash commands in Python where I have an answer which outlines some of the problems you are having in more detail.

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • Thanks for your answer! I was hoping that two processes could read from the same input and avoid creating a temporary file. I will implement your solution instead. – atorreso Aug 23 '18 at 13:34
  • I added `check=True` but this could still fail silently if the process substitution (the stuff in `<(...)`) has an error. – tripleee Aug 23 '18 at 13:37