1

I seem to be having a trivial issue with regard to comparing strings in python. I'm reading in from a text file and then comparing three characters at a time. It always seems to think the first "if" statement if correct which baffles me. (Note that input is printed out in the loop as a test and is giving correct strings to compare). Thanks for any help/advice :)

Text file input:

ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGa GGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGC AGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATG CTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGC TCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGAT CCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGCAAAGAATTCA CCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCA CTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTAAACT GGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAACATTTATTTTCATTGC

infile = open('DNA.txt', 'r')

while True:
    line = infile.readline()
    if not line: break
    a = []
    for i in range (0, len(line), 3):
        DNA = line[i:i+3]
        print DNA

        if DNA == 'ATT' or 'ATC' or 'ATA':
            a.append('I')

        elif DNA == 'CTT' or 'CTC' or 'CTA' or 'CTG' or 'TTA' or 'TTG':
            a.append('L')

        elif DNA == 'GTT' or 'GTC' or 'GTA' or 'GTG':
            a.append('V')

        elif DNA == 'TTT' or 'TTC':
            a.append('F')

        elif DNA == 'ATG':
            a.append('M')

        else:
            a.append('X')

    print str(a)

Output:

ACA
TTT
GCT
TCT
GAC
ACA
ACT
GTG
TTC
ACT
AGC
AAC
CTC
AAA
CAG
ACA
CCA
TGG
TGC
ATC
TGA
CTC
CTG
a

['I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I']

GGA
GAA
GTC
TGC
CGT
TAC
TGC
CCT
GTG
GGG
CAA
GGT
GAA
CGT
GGA
TGA
AGT
TGG
TGG
TGA
GGC
CCT
GGG
C

['I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I']

AGG
CTG
CTG
GTG
GTC
TAC
CCT
TGG
ACC
CAG
AGG
TTC
TTT
GAG
TCC
TTT
GGG
GAT
CTG
TCC
ACT
CCT
GAT
G

['I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I']

CTG
TTA
TGG
GCA
ACC
CTA
AGG
TGA
AGG
CTC
ATG
GCA
AGA
AAG
TGC
TCG
GTG
CCT
TTA
GTG
ATG
GCC
TGG
C

['I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I']

TCA
CCT
GGA
CAA
CCT
CAA
GGG
CAC
CTT
TGC
CAC
ACT
GAG
TGA
GCT
GCA
CTG
TGA
CAA
GCT
GCA
CGT
GGA
T

['I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I']

CCT
GAG
AAC
TTC
AGG
CTC
CTG
GGC
AAC
GTG
CTG
GTC
TGT
GTG
CTG
GCC
CAT
CAC
TTT
GGC
AAA
GAA
TTC
A

['I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I']

CCC
CAC
CAG
TGC
AGG
CTG
CCT
ATC
AGA
AAG
TGG
TGG
CTG
GTG
TGG
CTA
ATG
CCC
TGG
CCC
ACA
AGT
ATC
A

['I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I']

CTA
AGC
TCG
CTT
TCT
TGC
TGT
CCA
ATT
TCT
ATT
AAA
GGT
TCC
TTT
GTT
CCC
TAA
GTC
CAA
CTA
CTA
AAC
T

['I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I']

GGG
GGA
TAT
TAT
GAA
GGG
CCT
TGA
GCA
TCT
GGA
TTC
TGC
CTA
ATA
AAA
AAC
ATT
TAT
TTT
CAT
TGC
['I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I']
karthikr
  • 97,368
  • 26
  • 197
  • 188
Jared
  • 27
  • 8
  • So do you mean the `if not line: break` is not working to terminate the loop? Try `if not line.strip(): break`. It is probably the case that `line == '\n'`. – Two-Bit Alchemist Sep 03 '15 at 19:17
  • The line break is definitely working (else I'd have an infinite loop). The issue is in the comparison of the DNA strings, they always seem to think the first "if" statement is true :/ – Jared Sep 03 '15 at 19:19
  • Oh I see now. There was too much bad formatting and I missed a mistake in your code. – Two-Bit Alchemist Sep 03 '15 at 19:20
  • Sorry about the formatting, I'm still learning the ins and outs :) – Jared Sep 03 '15 at 19:24
  • It's not your fault. We had a discussion on Meta [just the other day](https://meta.stackoverflow.com/questions/303812/discourage-screenshots-of-code-and-or-errors) about why code formatting is difficult for new users here. Also not a lot of people working with genetics have encountered Markdown. :P – Two-Bit Alchemist Sep 03 '15 at 19:27

2 Answers2

3

It always evaluates to I because

if DNA == 'ATT' or 'ATC' or 'ATA':

always evaluates to True

equivalent of:

if (DNA == 'ATT') or ('ATC') or ('ATA'):

The truth value of 'ATC' is always True, hence the result.

You could check this way:

if DNA in ['ATT', 'ATC', 'ATA']:

The same holds for the other if clauses.


Also, note that all of this logic:

infile = open('DNA.txt', 'r')

while True:
    line = infile.readline()
    if not line: break

can be replaced by

with open('DNA.txt', 'r')  as infile:
    for line in infile:

Also, an alternate approach is to use a dictionary mapping, and lookup. That way, you can simplify all the if logics.. example:

dna_dict = {
    'ATT': 'I',
    'ATC': 'I',
    'ATA': 'I',
    ....
}

And then:

a.append(dna_dict.get(DNA, 'X'))
karthikr
  • 97,368
  • 26
  • 197
  • 188
2

This way is a lot more readable

with open('file.txt') as f:
    data = f.readlines()

for line in data:
    if not line:
        continue
    a = []
    segment = [line[i:i+3] for i in range(0, len(line), 3)]
    for dna in segment:
        if dna in ['ATT', 'ATC', 'ATA']:
            a.append('I')
        elif dna in ['CTT', 'CTC', 'CTA', 'CTG', 'TTA', 'TTG']:
            a.append('L')
        elif dna in ['GTT', 'GTC', 'GTA', 'GTG']:
            a.append('V')
        elif dna in ['TTT', 'TTC']:
            a.append('F')
        elif dna in ['ATG']:
            a.append('M')
        else:
            a.append('X')
    print a
Cody Bouche
  • 945
  • 5
  • 10