Reverse complement from a file

Question

The task is: Write a script (call it what you want) that that can analyze a fastafile (MySequences.fasta) by finding the reverse complement of the sequences. Using python.

from itertools import repeat

#opening file

filename = "MySequences.fasta"
file = open(filename, 'r')

#reading the file

for line in file:
    line = line.strip()
    if ">" in line:
        header = line
    elif (len(line) == 0):
        continue
    else:
        seq = line

#reverse complement

def reverse_complement(seq):
    compline = ''
    for n in seq:
        if n == 'A':
            compline += 'T'
        elif n == 'T':
            compline += 'A'
        elif n == 'C':
            compline += 'G'
        elif n == 'G':
            compline += 'C'
    return((compline)[::-1])

#run each line

for line in file:
    rc = reverse_complement(seq)
    print(rc)

Ok...so what's the problem? What's the current output? What's the expected output? Don't get mad at me, but it's not a jira board to just push tasks to. Please tell us what's up? — Gameplay, Dec 02 '22 at 13:22
I am so sorry for not being more informative. My problem is that it only prints the last sequence in the file, and not all of them. So I am hoping someone could help me understand what I've done wrong, because I do not know what to do to fix it. — notgoodatinformatics, Dec 02 '22 at 13:27
In addition to the answer below, I suggest to optimize the code for speed. For example, use `translate` on the entire sequence, instead of replacing one character at a time. For an example of usage, see https://stackoverflow.com/q/56378522/967621 . You may also want to add N -> N replacement (N being any nucleotide). — Timur Shtatland, Dec 02 '22 at 15:39

Constantin Hong · Accepted Answer · 2022-12-02T14:07:19.943

You run your function in the wrong place. To run your function for each iterator, run the function there.

#reading the file

for line in file:
    line = line.strip()
    if ">" in line:
        header = line
    elif (len(line) == 0):
        continue
    else:
        seq = line
        #run function for each line, each time.
        rc = reverse_complement(seq)
        print(rc)

In your previous code, all iteration is successful. But you didn't put the line to the function to run each time. In your previous code, after all, iterations, only the last line is assigned. Therefore you put the last line to the function at the end. This is why your code prints only one line.

The solution.

from itertools import repeat

#reverse complement

def reverse_complement(seq):
    compline = ''
    for n in seq:
        if n == 'A':
            compline += 'T'
        elif n == 'T':
            compline += 'A'
        elif n == 'C':
            compline += 'G'
        elif n == 'G':
            compline += 'C'
    return((compline)[::-1])


#opening file

filename = "MySequences.fasta"
file = open(filename, 'r')


#reading the file

for line in file:
    line = line.strip()
    if ">" in line:
        header = line
    elif (len(line) == 0):
        continue
    else:
        seq = line
        #run each line
        rc = reverse_complement(seq)
        print(rc)

Also, this is your other mistake. You put seq as input instead line. But even if you fix this, this code won't work for the same reason I told you before.

for line in file:
    rc = reverse_complement(seq)
    print(rc)

Thank you so very much Constantin Hong! This helped a lot – notgoodatinformatics Dec 02 '22 at 13:45 — notgoodatinformatics, Dec 02 '22 at 13:45

Reverse complement from a file

1 Answers1