1

I'm making a call to a program from the shell using the subprocess module that outputs a binary file to STDOUT.

I use Popen() to call the program and then I want to pass the stream to a function in a Python package (called "pysam") that unfortunately cannot Python file objects, but can read from STDIN. So what I'd like to do is have the output of the shell command go from STDOUT into STDIN.

How can this be done from within Popen/subprocess module? This is the way I'm calling the shell program:

p = subprocess.Popen(my_cmd, stdout=subprocess.PIPE, shell=True).stdout

This will read "my_cmd"'s STDOUT output and get a stream to it in p. Since my Python module cannot read from "p" directly, I am trying to redirect STDOUT of "my_cmd" back into STDIN using:

p = subprocess.Popen(my_cmd, stdout=subprocess.PIPE, stdin=subprocess.PIPE, shell=True).stdout

I then call my module, which uses "-" as a placeholder for STDIN:

s = pysam.Samfile("-", "rb")

The above call just means read from STDIN (denoted "-") and read it as a binary file ("rb").

When I try this, I just get binary output sent to the screen, and it doesn't look like the Samfile() function can read it. This occurs even if I remove the call to Samfile, so I think it's my call to Popen that is the problem and not downstream steps.

EDIT: In response to answers, I tried:

sys.stdin = subprocess.Popen(tagBam_cmd, stdout=subprocess.PIPE, shell=True).stdout
print "Opening SAM.."                                                                                            
s = pysam.Samfile("-","rb")
print "Done?"
sys.stdin = sys.__stdin__    

This seems to hang. I get the output:

Opening SAM..

but it never gets past the Samfile("-", "rb") line. Any idea why?

Any idea how this can be fixed?

EDIT 2: I am adding a link to Pysam documentation in case it helps, I really cannot figure this out. The documentation page is:

http://wwwfgu.anat.ox.ac.uk/~andreas/documentation/samtools/usage.html

and the specific note about streams is here:

http://wwwfgu.anat.ox.ac.uk/~andreas/documentation/samtools/usage.html#using-streams

In particular:

""" Pysam does not support reading and writing from true python file objects, but it does support reading and writing from stdin and stdout. The following example reads from stdin and writes to stdout:

infile = pysam.Samfile( "-", "r" )
outfile = pysam.Samfile( "-", "w", template = infile )
for s in infile: outfile.write(s)

It will also work with BAM files. The following script converts a BAM formatted file on stdin to a SAM formatted file on stdout:

infile = pysam.Samfile( "-", "rb" )
outfile = pysam.Samfile( "-", "w", template = infile )
for s in infile: outfile.write(s)

Note, only the file open mode needs to changed from r to rb. """

So I simply want to take the stream coming from Popen, which reads stdout, and redirect that into stdin, so that I can use Samfile("-", "rb") as the above section states is possible.

thanks.

  • `import sys`, `sys.stdin.write(p.stdout.read())`? Make sure to `s = pysam.Samfile("-", "rb")` before writing to the stdin.. – John Doe Dec 11 '11 at 19:56
  • Are you still seeing binary output to the screen? In other words, are you positive that your tagBam_cmd is really sending the result to stdout? – David K. Hess Dec 11 '11 at 20:13
  • I'm not seeing a binary screen, I'm just seeing the program hang. When I run the content of tagBam_cmd it prints it out to the screen, so I assume it's stdout. How can I check though? –  Dec 11 '11 at 20:15
  • Run your tagBam_cmd in the shell as: `tagBam_cmd > test_output` and see if the binary ends up in the file. If not, then it may be going to stderr. – David K. Hess Dec 11 '11 at 20:18
  • I've done that and it definitely goes to the file. –  Dec 11 '11 at 20:21
  • It is easier to turn it inside out: `tagBam_cmd | sam_script` or [use a named pipe or `/dev/fd/#` file](http://stackoverflow.com/q/28840575/4279). – jfs Apr 23 '16 at 11:36

3 Answers3

2

In the specific case of dealing with pysam, I was able to work around the issue using a named pipe (http://docs.python.org/library/os.html#os.mkfifo), which is a pipe that can be accessed like a regular file. In general, you want the consumer (reader) of the pipe to listen before you start writing to the pipe, to ensure you don't miss anything. However, pysam.Samfile("-", "rb") will hang as you noted above if nothing is already registered on stdin.

Assuming you're dealing with a prior computation that takes a decent amount of time (e.g. sorting the bam before passing it into pysam), you can start that prior computation and then listen on the stream before anything gets output:

import os
import tempfile
import subprocess
import shutil
import pysam

# Create a named pipe
tmpdir = tempfile.mkdtemp()
samtools_prefix = os.path.join(tmpdir, "namedpipe")
fifo = samtools_prefix + ".bam"
os.mkfifo(fifo)

# The example below sorts the file 'input.bam',
# creates a pysam.Samfile object of the sorted data,
# and prints out the name of each record in sorted order

# Your prior process that spits out data to stdout/a file
# We pass samtools_prefix as the output prefix, knowing that its
# ending file will be named what we called the named pipe
subprocess.Popen(["samtools", "sort", "input.bam", samtools_prefix])

# Read from the named pipe
samfile = pysam.Samfile(fifo, "rb")

# Print out the names of each record
for read in samfile:
    print read.qname

# Clean up the named pipe and associated temp directory
shutil.rmtree(tmpdir)
Cory
  • 41
  • 1
  • if a system supports it then it might be easier to [use `/dev/fd/#` filenames](http://stackoverflow.com/a/36810354/4279). – jfs Apr 23 '16 at 11:38
2

I'm a little confused that you see binary on stdout if you are using stdout=subprocess.PIPE, however, the overall problem is that you need to work with sys.stdin if you want to trick pysam into using it.

For instance:

sys.stdin = subprocess.Popen(my_cmd, stdout=subprocess.PIPE, shell=True).stdout
s = pysam.Samfile("-", "rb")
sys.stdin = sys.__stdin__ # restore original stdin

UPDATE: This assumed that pysam is running in the context of the Python interpreter and thus means the Python interpreter's stdin when "-" is specified. Unfortunately, it doesn't; when "-" is specified it reads directly from file descriptor 0.

In other words, it is not using Python's concept of stdin (sys.stdin) so replacing it has no effect on pysam.Samfile(). It also is not possible to take the output from the Popen call and somehow "push" it on to file descriptor 0; it's readonly and the other end of that is connected to your terminal.

The only real way to get that output onto file descriptor 0 is to just move it to an additional script and connect the two together from the first. That ensures that the output from the Popen in the first script will end up on file descriptor 0 of the second one.

So, in this case, your best option is to split this into two scripts. The first one will invoke my_cmd and take the output of that and use it for the input to a second Popen of another Python script that invokes pysam.Samfile("-", "rb").

David K. Hess
  • 16,632
  • 2
  • 49
  • 73
  • Although I rather preferred just writing to `sys.stdin` instead (see my comment on main post). – John Doe Dec 11 '11 at 20:03
  • 1
    That's not going to work. You would be trying to write to a file stream object that's read-only - not pushing data on for future reads. Now, if you opened /dev/fd/0 and wrote to that, if could work. – David K. Hess Dec 11 '11 at 20:05
  • @DavidK.Hess: I tried your suggestion but it seems to hang at Samfile()... does it work for you? See EDIT to my original post. –  Dec 11 '11 at 20:12
  • @user248237 take a look at my edit - are you sure that pyasm is reading from the interpreter's stdin or is it running an external program with those arguments? A link to the docs for pyasm.Samfile would help too. – David K. Hess Dec 11 '11 at 20:27
  • @DavidK.Hess: Adding links. Relevant part is here: http://wwwfgu.anat.ox.ac.uk/~andreas/documentation/samtools/usage.html#using-streams –  Dec 11 '11 at 20:32
  • @DavidK.Hess: pysam calls a program called "samtools" under the hood - would that be the problem? If the manual is right, it should read from stdin, and so if I redirect stdin to the pipe coming from Popen I don't see why it shouldn't work... –  Dec 11 '11 at 20:35
  • Poking in the source code, it looks like it reads from fd 0 in this case - not Python's stdin, so this won't work. At this point, your easiest route is to take the output from the first program, write it to a file and give that to pyasm.Samfile. – David K. Hess Dec 11 '11 at 20:45
  • @DavidK.Hess: :( This is terrible design unfortunately since the files I'm working with are so large, so it is really wasteful. If that's the last resort, do you recommend using the tempfile package and making it a temporary file? Keep in mind these files are ~10-20 GB –  Dec 11 '11 at 20:53
  • In that case, write two python scripts - one that invokes tagBam_cmd and that takes the stdout of that and passes it to another Popen of another Python script that does pyasm.Samfile("-". "rb"). – David K. Hess Dec 11 '11 at 21:03
  • @DavidK.Hess: Could you explain how the two Python script solution gets around the issue? Why does having two script fix it? I'm not sure I follow. Thanks again –  Dec 12 '11 at 20:10
0

If your system supports it; you could use /dev/fd/# filenames:

process = subprocess.Popen(args, stdout=subprocess.PIPE)
samfile = pysam.Samfile("/dev/fd/%d" % process.stdout.fileno(), "rb")
Community
  • 1
  • 1
jfs
  • 399,953
  • 195
  • 994
  • 1,670