I am trying to write a pipeline in python and make proper use of subprocess and not invoke shell=True
. One common task in bioinformatics is to align sequences with a program such as bwa
and pass the sam-formatted results on to samtools
to do downstream processing. The python package pysam
helps me do all of the tasks samtools
does but within python.
I would like to pass the results from a call to the aligner bwa
to pysam
without having to write to a file.
Pysam allows for you to open Samfile
object with the input file being set to "-"
and it will read from stdin. Similarly, bwa
writes its results to stdout.
The way I have written it thus far is:
bwa_call = ["bwa", "mem", "-v", "1", "-t", str(cores), index, fwd, rev]
bwa = subprocess.Popen(bwa_call, stdout=subprocess.PIPE)
samfile = pysam.Samfile("-", "r")
This seems to work, in that I don't see the stdout output from bwa
but the problem is that pysam
does not know when the file is finished and so just keeps waiting.
Is there a way in which I can pass the stdout from bwa
directly to pysam
without writing to a file?