4

I am designing a code that requires a .fasta file to be input at one of the early stages. Right now, I am validating the input using this function:

def file_validation(fasta):
    while True:
        try:
            file_name= str(raw_input(fasta))
        except IOError:
            print("Please give the name of the fasta file that exists in the folder!")
            continue

        if not(file_name.endswith(".fasta")):
            print("Please give the name of the file with the .fasta extension!")
        else:
            break
    return file_name

Now, although this function works fine, there is still some room for error in the sense that a user could potentially maybe input a file that, while having a file name that ends with .fasta, could have some non-.fasta content inside. What could I do to prevent this and let the user know that his/her .fasta file is corrupted?

Arya McCarthy
  • 8,554
  • 4
  • 34
  • 56
Bob McBobson
  • 743
  • 1
  • 9
  • 29
  • Write a fasta parser or find an existing one online. – Kevin May 31 '17 at 19:43
  • I know *nothing* about fasta myself. However, there's an answer that might be useful to you at https://stackoverflow.com/a/7655072/131187. I would say, try to parse a few hundred characters, assuming that make sense. Accept the file if the parse succeeds. – Bill Bell May 31 '17 at 19:46
  • @Kevin I tried using the Biopython SeqIO 'SeqIO.parse(file_name, 'fasta'))' as part of the conditional tree in order to see if parsing was possible. Sadly, it does not function in warning the user that the file is not a valid .fasta file. Is this what you meant? – Bob McBobson May 31 '17 at 20:22
  • @BillBell I sadly am not up to level to understand certain parts of the code you linked. Do you think I could copy and paste it right into my function in order for it to tell the user that their file is not in the proper .fasta format? – Bob McBobson May 31 '17 at 20:33
  • I see that your question has been answered! – Bill Bell Jun 01 '17 at 14:09

1 Answers1

6

Why not just parse the file as if it were FASTA and see whether it breaks?

Using biopython, which silently fails by returning an empty generator on non-FASTA files:

from Bio import SeqIO

my_file = "example.csv"  # Obviously not FASTA

def is_fasta(filename):
    with open(filename, "r") as handle:
        fasta = SeqIO.parse(handle, "fasta")
        return any(fasta)  # False when `fasta` is empty, i.e. wasn't a FASTA file

is_fasta(my_file)
# False
Arya McCarthy
  • 8,554
  • 4
  • 34
  • 56
  • I'm trying to figure out how to integrate the part of the code starting at 'with open(filename)....' into my function, and I'm having difficulty making work still. Do you think I should just make this a separate function and then call it in my original validation function? – Bob McBobson May 31 '17 at 21:09
  • This already is a freestanding function, so you should be able to call it from within your validation function. Could you clarify your intent? – Arya McCarthy May 31 '17 at 21:21
  • I just wanted to make sure that my code didn't break, and would continue to prompt the user for a valid answer. And your set up did work in the end! thank you very much. – Bob McBobson May 31 '17 at 22:12
  • @AryaMcCarthy can you comment on mine https://bioinformatics.stackexchange.com/questions/15050/biopython-seqio-check-input-file – pippo1980 Dec 13 '20 at 11:30