0

I have a long series of files that I need analyze, in batch, using a python script, e.g.,:

all_transcriptomes.fasta
OFAS000003-RA_rbh.fasta
OFAS000101-RA_rbh.fasta
OFAS000115-RA_rbh.fasta
OFAS000119-RA_rbh.fasta

And so on. I need to query each *rbh.fasta against the all_transcriptomes.fasta.

Here is the beginning section of the script I'm using to perform the analysis (up to specifying variables):

#!/usr/bin/env python
Usage = """This is a program for to do an entire reciprocal BLAST search.
The nomenclature is as follows:
Sequences A vs Sequences B is the BLAST search
Step one = Seqs A vs BLAST database of Seqs B
Step two (this step) = Seqs B vs BLAST database of Seqs A

The columns of the results table MUST be as follows:

Column 1: query id
Column 2: query length
Column 3: subject id
Column 4: subject length
Column 5: bitscore
Column 6: e-value
Column 7 onwards can be however you like. 

"""

#import any modules needed
import subprocess

#Give information required
print "\nA program to run a full reciprocal BLAST program. \n\
If you are having any problems, it is worth reading the usage for the assumptions and lay out\n\n"

#The inputs required are as follows:\n\n\
#\
#1. Project Name\n\n\
#\
#2. Name of the fasta files containing the first [A] set of sequences (must be in the working folder)\n\n\
#\
#3. Name of the fasta files containing the second [B] set of sequences (must be in the working folder)\n\n\
#\
#4. What kind of BLAST search are you doing?\n\n\
#\
#5. What e-value cut-off you would like?\n\n\"

#Get required inputs
ProjectName = "project_name"

SeqsFastaA = "query_file_name.fasta"
#SeqsFastaA = "temp_seqs.fasta"

SeqsFastaB = "all_transcriptomes.fasta"
#SeqsFastaB = "temp_seqs.fasta"

BlastType = "blastn"
#BlastType = "blastn"

Evalue = "1e-20"
#Evalue = 1e-05

Because of my lack of python scripting experience, I do not know how I can get this script to call each OFAS*.fasta for SeqsFastaA. I also would like the ProjectName to be the same as the OFAS*.fasta file names, but without .fasta. Can someone assist with modifying this script to do what I would like for it do when called from a batch shell script?

cdarke
  • 42,728
  • 8
  • 80
  • 84
Mike F
  • 111
  • 7
  • read this to get the files in your directory [How to list all files of a directory in Python](http://stackoverflow.com/questions/3207219/how-to-list-all-files-of-a-directory-in-python), then with the list of files [filter the stringst for project name and coincidence to compare](http://stackoverflow.com/questions/21903842/how-do-i-compare-two-strings-in-python) and with that solved [open the files](http://www.tutorialspoint.com/python/python_files_io.htm) and [compair the content](http://stackoverflow.com/questions/19007383/compare-two-different-files-line-by-line-in-python). – DIEGO CARRASCAL Jul 25 '16 at 15:46
  • There is a python module for fasta processing, see https://pypi.python.org/pypi/pyfasta/. If you Google "python fasta" you will find many links that should help. Other search engines are available. – cdarke Jul 25 '16 at 15:49

0 Answers0