1

I have a sequence of DNA of "atgactgccatggaggagtc". The problem told me to decompose it into triplets and translate the triplets into proteins. I have the code that do that. However at the end there are only 2 nucleotides left, so I can't make a triplet out of it. How can I tell Python to list "-" instead if a triplet doesn't have 3 nucleotides in it?

DNA ="ATGACTGCCATGGAGGAGTC"
codon=[DNA[i:i+3]
for i in range(0,len(DNA),3)]
print(codon)

def translate(codon):
      
    table = {
        'ATA':'I', 'ATC':'I', 'ATT':'I', 'ATG':'M',
        'ACA':'T', 'ACC':'T', 'ACG':'T', 'ACT':'T',
        'AAC':'N', 'AAT':'N', 'AAA':'K', 'AAG':'K',
        'AGC':'S', 'AGT':'S', 'AGA':'R', 'AGG':'R',                
        'CTA':'L', 'CTC':'L', 'CTG':'L', 'CTT':'L',
        'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCT':'P',
        'CAC':'H', 'CAT':'H', 'CAA':'Q', 'CAG':'Q',
        'CGA':'R', 'CGC':'R', 'CGG':'R', 'CGT':'R',
        'GTA':'V', 'GTC':'V', 'GTG':'V', 'GTT':'V',
        'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCT':'A',
        'GAC':'D', 'GAT':'D', 'GAA':'E', 'GAG':'E',
        'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGT':'G',
        'TCA':'S', 'TCC':'S', 'TCG':'S', 'TCT':'S',
        'TTC':'F', 'TTT':'F', 'TTA':'L', 'TTG':'L',
        'TAC':'Y', 'TAT':'Y', 'TAA':'_', 'TAG':'_',
        'TGC':'C', 'TGT':'C', 'TGA':'_', 'TGG':'W',
    }
    protein =""
    for i in range(0, len(codon), 3):
            codons = codon[i:i + 3]
            protein+= table[codons]
    return protein

translate(DNA)
BrokenBenchmark
  • 18,126
  • 7
  • 21
  • 33
pbicez
  • 55
  • 4

1 Answers1

2

You can use .get(), which returns the value of the key if it exists in the dictionary, else it returns the second parameter to .get() (by default, .get() returns None, but we explicitly specify - here per the question's requirements):

Change

protein += table[codons]

to

protein += table.get(codons, '-')

and you'll get an output of:

MTAMEE-

Besides your immediate question, I would also suggest improving the code in two ways:

  1. The dictionary doesn't need to be reassigned every time the function is called. Move it outside of the function.

  2. Repeated string concatenation is slow. Use ''.join() instead.

With these extra changes, we get:

DNA ="ATGACTGCCATGGAGGAGTC"
codon = [DNA[i:i+3] for i in range(0, len(DNA), 3)]
table = # same as in original question, omitted for brevity

def translate(codon):
    protein = []
    for i in range(0, len(codon), 3):
        codons = codon[i:i + 3]
        protein.append(table.get(codons, '-'))
    return ''.join(protein)

print(translate(DNA))
BrokenBenchmark
  • 18,126
  • 7
  • 21
  • 33