3

I am having trouble passing a string in python (python variable) as input to a sequence alignment program (muscle) on the command line (bash). muscle can take stdin from the command line, e.g.;

~# echo -e ">1\nATTTCTCT\n>2\nATTTCTCC" | muscle

MUSCLE v3.8.31 by Robert C. Edgar

http://www.drive5.com/muscle
This software is donated to the public domain.
Please cite: Edgar, R.C. Nucleic Acids Res 32(5), 1792-97.

- 2 seqs, max length 8, avg  length 8
00:00:00     22 MB(2%)  Iter   1  100.00%  K-mer dist pass 1
00:00:00     22 MB(2%)  Iter   1  100.00%  K-mer dist pass 2
00:00:00     23 MB(2%)  Iter   1  100.00%  Align node       
00:00:00     23 MB(2%)  Iter   1  100.00%  Root alignment
>1
ATTTCTCT
>2
ATTTCTCC

It is this fasta alignment (the last 4 lines) that I am after - you can redirect the output of muscle (echo -e ">1\nATTTCTCT\n>2\nATTTCTCC" | muscle > out.file to get just the fasta alignment, which I need for downstream processing. BUt to get there, I have to pass 'muscle' the FASTA sequence string, which I think is best done via echo as above in bash.

So, the script takes two multiFASTA files and and pairs each FASTA sequence based on a list of ids for each file - this is working (though I realize it is perhaps not the most efficient way to go about it - I am new python user). Then I need to align each set in muscle before calculating distances/difference.

Here is what I have so far :

#! /env/python
from pairwise_distances import K2Pdistance
import pairwise_distances
import subprocess
from subprocess import call
import os     

fasta1=['']
for line in open('test1.fasta'):
    if not line.startswith('>'):
         fasta1.append(line.strip())

fasta2=['']
for line in open('test2.fasta'):
    if not line.startswith('>'):
         fasta2.append(line.strip())

for l1, l2 in zip(open('test1.list'), open ('test2.list')):
 try:
   a=fasta1[int(l1)]
 except IndexError,e:  
   a="GGG"

 try:
   b=fasta2[int(l2)]
 except (IndexError):
   b="CCC"

 temp_align=str(">1"+'\n'+a+'\n'+">2"+'\n'+b) 

 first=subprocess.check_output(['echo','-e',temp_align])
 print first
 subprocess.call(['bash','muscle'], stdin=first.stdout)
 print second

 #new=K2Pdistance(outfast1,outfast2) 
 #subprocess.Popen(['bash','muscle'], stdin=subprocess.check_output(['echo','-e',temp_align], stdout=subprocess.PIPE).std.out)`

The 'temp_align' variable is what I want to pass to muscle - it is the result of combining the appropriate fasta sequence from each multiFASTA file, for each loop over the ids/list, and is formatted like a FASTA file.

This issue is that I can echo the FASTA string, but I cant seem to "pipe" that via stdin to muscle... The main error I get is: AttributeError: 'str' object has no attribute 'stdout'.

~#python Beta3.py 
>1
ATCGACTACT
>2
ATCGCGCTACT

Traceback (most recent call last):
  File "Beta3.py", line 38, in <module>
    subprocess.call(['bash','muscle'], stdin=first.stdout)
AttributeError: 'str' object has no attribute 'stdout'

Can subprocess or other commandline modules take strings as stdin? IF not, I am not sure how I can echo and then pipe into muscle... Any other ideas would be much appreciated. I tried a few options from this thread unix.stackexchange

but no luck yet.

EDIT: Here are some example files:

~# cat test1.fasta 
>1
ATCGACTACT
>2
ACTCAGTCA
>3
TTCACAGGG
~# cat test2.fasta 
>1
ATCGCGCTACT
>2
GGCGTCAGTCA
>3
TTCAACCCCAGGG
~# cat test1.list 
1
2
3
~# cat test2.list 
1
3
2

EDIT 2: That previous post is related, but my question is about linking two bash commands starting with a python variable that is a string...and then, ideally, capturing the stdout of the second command back into a python variable... I dont quite see how to translate the answers on that post into a solution for my specific question...I guess I don;t fully understand what the poster was trying to do.

Community
  • 1
  • 1
LP_640
  • 579
  • 1
  • 5
  • 17
  • `first` is a string, to send it to a command see: http://stackoverflow.com/a/165662/7311767 – Stephen Rauch Feb 25 '17 at 04:50
  • Why `subprocess.call(['bash','muscle'], stdin=first.stdout)` and not just `subprocess.call(['muscle'], stdin=first.stdout)`? – Diego Torres Milano Feb 25 '17 at 04:52
  • It doesn't make a difference (string error still present) , but I guess 'bash' is not needed there. – LP_640 Feb 25 '17 at 05:08
  • The result of `subprocess.check_output` is a string (stdout of the exec'd subprocess). You want to either use `subprocess.call` or provide a string like object as the stdin parameter which should simplify things. – stderr Feb 25 '17 at 18:02
  • This question is already closed as a duplicate, so I can't provide a full answer, but BioPython has a wrapper for `muscle` to do multiple sequence alignment, see [Chapter 6.4.2](http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc81) in the tutorial. – BioGeek Feb 25 '17 at 20:12

1 Answers1

3

It seems that you want to communicate with the muscle process, then you need a PIPE, use this

(out, err) = subprocess.Popen(['muscle'], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE).communicate(temp_align)
print out
Diego Torres Milano
  • 65,697
  • 9
  • 111
  • 134