0

I have a python script as follow:

#!/usr/bin/python
from Bio import SeqIO

fasta_file = "input.fa" # Input fasta file
wanted_file = "A_ids.txt" # Input interesting sequence IDs, one per line
result_file = "A.fasta" # Output fasta file

wanted = set()
with open(wanted_file) as f:
    for line in f:
        line = line.strip()
        if line != "":
            wanted.add(line)

fasta_sequences = SeqIO.parse(open(fasta_file),'fasta')
with open(result_file, "w") as f:
    for seq in fasta_sequences:
        if seq.id in wanted:
            SeqIO.write([seq], f, "fasta")

I would like to run the script above for the same input file, but for 40 different wanted_files - with different names - A_ids.txt, B_ids.txt, etc. And I would like to have their respective different outputs - A.fasta, B.fasta, etc.

Do I need to change my python script or I need to create a loop to run it for all my wanted files?

thanks

Alex
  • 355
  • 1
  • 7
  • 20
  • 2
    You can certainly make a list of all your input files and the corresponding outputs and loop over it. If this is a process you expect to repeat, it might be worthwhile to change your script to read the filenames in from a file and then write a function to construct output filenames from that. – hoyland Nov 11 '16 at 17:04

3 Answers3

4

I agree with @BlackVegetable. Set it to use command line arguments, by doing something like this:

#!/usr/bin/python
from Bio import SeqIO

import sys # for sys.argv

fasta_file = sys.argv[1] # This is now going to be name.fa, the fasta file
wanted_file = sys.argv[2] # This is now going to be name_ids.txt, or whatever you passed
# as an argument
result_file = sys.argv[3] # Output fasta file, now passed as arg

wanted = set()
with open(wanted_file) as f:
    for line in f:
        line = line.strip()
        if line != "":
            wanted.add(line)

fasta_sequences = SeqIO.parse(open(fasta_file),'fasta')
with open(result_file, "w") as f:
    for seq in fasta_sequences:
        if seq.id in wanted:
            SeqIO.write([seq], f, "fasta")

Then you could call the program with python input.fa A_ids.txt A.fasta, in your case. Or, python inputB.fa B_ids.txt B.fasta.

0

Consider having this program taking commandline options. This would allow you to read the wanted_file name from the commandline as an argument and you could deduce the appropriate output file name by parsing the given argument and following a pattern (such as replace extension given with .fasta) or having the output pattern be another command line argument of some sort.

You could call your program as python my_script.py A_ids.txt and loop over that via bash. You could also choose to allow for a variable number of arguments, each of which would invoke your logic for the given name.

My preference for dealing with command line arguments is https://docs.python.org/3.3/library/argparse.html and https://docs.python.org/2/library/argparse.html depending on your python version.

(Additionally, if you take the path of using a single command line argument for the wanted_file, you could simply output the contents to stdout via print or similar functions and use a redirection operator in the command line to send the output to a filename provided there: python my_script.py A_ids.txt > A.fasta)

BlackVegetable
  • 12,594
  • 8
  • 50
  • 82
0

I think simpler way could be storing the 40 filenames in a file (in the code: wanted_filenames_file), store them in an array (wanted_files) and loop along each one of the files :

# !/usr/bin/python
from Bio import SeqIO

fasta_file = "input.fa"  # Input fasta file
wanted_filenames_file = "filenames.txt"
with open(wanted_filenames_file) as f:
    wanted_files = f.readlines()
result_file = []  # Output fasta file
for wanted_file in wanted_files:
    wanted = set()
    with open(wanted_file) as f:
        for line in f:
            line = line.strip()
            if line != "":
                wanted.add(line)

    fasta_sequences = SeqIO.parse(open(fasta_file), 'fasta')
    result_file = wanted_file.replace("_ids.txt", ".fasta")
    with open(result_file, "w") as f:
        for seq in fasta_sequences:
            if seq.id in wanted:
                SeqIO.write([seq], f, "fasta")
Benjy Malca
  • 597
  • 1
  • 9
  • 21
  • Did not work. I create a file with all file names that I want (filenames.txt) for the wanted files. I got some errors - from: can't read /var/mail/Bio ./modified.py: line 4: fasta_file: command not found ./modified.py: line 5: wanted_filenames_file: command not found ./modified.py: line 6: syntax error near unexpected token `(' ./modified.py: line 6: `with open(wanted_filenames_file) as f:' – Alex Nov 11 '16 at 19:38
  • @Clarissa are three errors raised at the same time?. I think i don't explained it clearly. You have to create a file called `filenames.txt`, where you should store all the filenames, one per line. – Benjy Malca Nov 11 '16 at 20:13
  • Yes I did it. I created a file called filenames.txt with all filenames inside, one per line. – Alex Nov 15 '16 at 21:24
  • But the 3 errors are raised at the same time?. ¿Could you share the stacktrace error in pastebin? – Benjy Malca Nov 16 '16 at 17:51