3

I would like to convert the following shell command into python code using subprocess. In particular, how to convert the multiple <(zcat ...) as stdin? Ideally, the code should use subprocess.call, but subprocess.Popen is fine, too.

rsem-calculate-expression \
-p 2 \
--some-other-option \
--paired-end \
<(zcat a_1.fastq.gz b_1.fastq.gz c_1.fastq.gz) \
<(zcat b_2.fastq.gz b_2.fastq.gz c_2.fastq.gz) \
~/index/hg19 \
output_dir \
1>stdin.log \
2>stderr.log
zyxue
  • 7,904
  • 5
  • 48
  • 74
  • While this might be possible, it would be much safer to do it all in Python, which provides well working `gzip` library and `logging` shall be able to write whenever you need. – Jan Vlcinsky May 07 '14 at 16:13
  • stupid question but why are there multiple zcat in the first place? – mofoe May 07 '14 at 16:13
  • Yeh, just realized multiple zcat could have been combined. I guess in this case, it's for clarity of the two categories of fastq.gz files, 1 and 2. – zyxue May 07 '14 at 16:16
  • rsem-calculate-expression is from a sophisticated C++ package, non-trivial to rewrite at all. – zyxue May 07 '14 at 16:29
  • How can those `zcat`s be combined? They are two separate arguments to the program. – kitti May 07 '14 at 16:30
  • `<(command)` doesn't write to subprocess' stdin it is passed as a named file to the program i.e., it is equivalent to `command_output.txt`. In Python, you could use a named pipe to emulate it. – jfs May 07 '14 at 23:00

2 Answers2

3

Another way to run the bash command is to use shell=True:

from subprocess import check_call

check_call('rsem-calculate-expression -p 2 --some-other-option '
           '--paired-end '
           '<(zcat a_1.fastq.gz b_1.fastq.gz c_1.fastq.gz) '
           '<(zcat b_2.fastq.gz b_2.fastq.gz c_2.fastq.gz) '
           '~/index/hg19 '
           'output_dir '
           '1>stdin.log '
           '2>stderr.log', shell=True, executable='/bin/bash')

For comparison, here's how to do it without running shell:

#!/usr/bin/env python3
import os
import shlex
from contextlib import ExitStack # $ pip install contextlib2 on Python 2
from shutil import rmtree
from subprocess import Popen
from tempfile import mkdtemp

# convert command-line into a list (program arguments)
args = 'rsem-calculate-expression -p2 --some-other-option --paired-end'.split()
zcat_args = [shlex.split('zcat a_1.fastq.gz b_1.fastq.gz c_1.fastq.gz'),
             shlex.split('zcat b_2.fastq.gz b_2.fastq.gz c_2.fastq.gz')]
npipes = len(zcat_args)
with ExitStack() as stack:
    # create named pipes
    pipenames = []
    dirname = mkdtemp() # create temporary directory for named pipes
    stack.callback(rmtree, dirname) # clean up at the end
    for i in range(npipes):
        pipename = os.path.join(dirname, 'zcat_named_pipe' + str(i))
        os.mkfifo(pipename)
        args.append(pipename) # append to the program args
        pipenames.append(pipename)

    # run rsem-calculate-expression that reads from the named pipes
    args.append(os.path.expanduser('~/index/hg19'))
    args.append('output_dir')
    with open('stdout.log', 'wb', 0) as f1, open('stderr.log', 'wb', 0) as f2:
        rsem = Popen(args, stdout=f1, stderr=f2)
    stack.callback(rsem.wait)

    # run zcat commands to populate the named pipes
    for pipename, cmd in zip(pipenames, zcat_args):
        if rsem.poll() is None: # it is still running and reading from pipes
            with open(pipename, 'wb', 0) as pipe: # may block if no readers
                stack.callback(Popen(cmd, stdout=pipe).wait)
exit(rsem.returncode != 0)
jfs
  • 399,953
  • 195
  • 994
  • 1,670
1

The easiest way is to just let bash handle the process substitutions:

subprocess.Popen(['/bin/bash', '-c', 'rsem-calculate-expression -p 2 \
    --some-other-option --paired-end \
    <(zcat a_1.fastq.gz b_1.fastq.gz c_1.fastq.gz) \
    <(zcat b_2.fastq.gz b_2.fastq.gz c_2.fastq.gz) \
    ~/index/hg19 output_dir'])
jfs
  • 399,953
  • 195
  • 994
  • 1,670
kitti
  • 14,663
  • 31
  • 49
  • note: bash doesn't see the backslashes -- they are interpreted by Python -- either escape them \\ or you could use implicit string literal concatenation instead e.g., `'a' 'b'` is the same as `'ab'`. – jfs May 07 '14 at 22:57
  • Bash isn't supposed to see those backslashes. They are escaping the newlines in the string. – kitti May 08 '14 at 13:59
  • the bash command in the question uses \ therefore the intent is not clear without a comment. pep-8 recommends against `\\` for Python code: [*"Long lines can be broken over multiple lines by wrapping expressions in parentheses. These should be used in preference to using a backslash for line continuation."*](http://legacy.python.org/dev/peps/pep-0008/#maximum-line-length), see [my answer that uses implied string literals concatenation that are spread over multiple lines inside parentheses](http://stackoverflow.com/a/23530685/4279). – jfs May 08 '14 at 14:34
  • Backslash-newline is perfectly clear (and the supposed lack of clarity didn't stop you from answering with the exact same assumption). And I don't care about PEP 8. Just quit with the butthurt already; your answer wasn't accepted, move along. – kitti May 08 '14 at 14:57