0

I was wondering what could be wrong with the code below and why I am getting an error KeyError: '[' ?

The program is meant to translate the input DNA sequence to an RNA sequence and then from the RNA sequence stored in RNA [] produce the AMINO ACID sequence from the dict.

Thanks

DNA = "ACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGC"
RNA = []

AMINO_ACIDS = {"UUU":"F", "UUC":"F", "UUA":"L", "UUG":"L",
    "UCU":"S", "UCC":"s", "UCA":"S", "UCG":"S",
    "UAU":"Y", "UAC":"Y", "UAA":"STOP", "UAG":"STOP",
    "UGU":"C", "UGC":"C", "UGA":"STOP", "UGG":"W",
    "CUU":"L", "CUC":"L", "CUA":"L", "CUG":"L",
    "CCU":"P", "CCC":"P", "CCA":"P", "CCG":"P",
    "CAU":"H", "CAC":"H", "CAA":"Q", "CAG":"Q",
    "CGU":"R", "CGC":"R", "CGA":"R", "CGG":"R",
    "AUU":"I", "AUC":"I", "AUA":"I", "AUG":"M",
    "ACU":"T", "ACC":"T", "ACA":"T", "ACG":"T",
    "AAU":"N", "AAC":"N", "AAA":"K", "AAG":"K",
    "AGU":"S", "AGC":"S", "AGA":"R", "AGG":"R",
    "GUU":"V", "GUC":"V", "GUA":"V", "GUG":"V",
    "GCU":"A", "GCC":"A", "GCA":"A", "GCG":"A",
    "GAU":"D", "GAC":"D", "GAA":"E", "GAG":"E",
    "GGU":"G", "GGC":"G", "GGA":"G", "GGG":"G",}

RNA_2 = str(RNA)
for char in DNA:
    if char == "G":
        RNA.append("C")
    elif char == "C":
        RNA.append("G")
    elif char == "A":
        RNA.append("U")
    elif char == "T":
        RNA.append("A")

translated = ''.join(AMINO_ACIDS[i] for i in RNA_2)

print("DNA sequence: " + DNA)
print()
print("Length of DNA sequence in base pairs: " + str(len(DNA)))
print()
print("RNA sequence of DNA sequence: " +("".join(RNA)))
print()
print("AMINO ACID sequence: " + str(translated))
  • Could you post your complete traceback? – SmeltQuake Feb 04 '15 at 15:33
  • RNA_2 is going to equal '[]' because you se RNA_2 = str(RNA) when RNA is an empty list – user2097159 Feb 04 '15 at 15:33
  • `code`C:\Python34\python.exe C:/Users/Luan/Desktop/PYTHON/PROGRAMS/dna_to_rna.py Traceback (most recent call last): File "C:/Users/Luan/Desktop/PYTHON/PROGRAMS/dna_to_rna.py", line 32, in translated = ''.join(AMINO_ACIDS[i] for i in RNA_2) File "C:/Users/Luan/Desktop/PYTHON/PROGRAMS/dna_to_rna.py", line 32, in translated = ''.join(AMINO_ACIDS[i] for i in RNA_2) KeyError: '[' Process finished with exit code 1`code` –  Feb 04 '15 at 15:36
  • When you do `RNA_2 = str(RNA)`, `RNA` is an empty list, so `RNA_2` becomes a string containing the two characters "[" and "]". Is that the intended behavior? – Kevin Feb 04 '15 at 15:39
  • Hi there, DNA should become ACAAGAUGCCAUUGUCCCCCGGCCUCCUGCUGCUGCUGCUCUCCGGGGCCACGGCCACCGCUGCCCUGC and then that should become TRCHCPPASCCCCSPGPRPPLPC –  Feb 04 '15 at 15:44

1 Answers1

0

You don't need RNA_2, but you do need a way to split an RNA string into chunks of three character strings. Borrowing a chunk function from this post:

def chunks(l, n):
    """ Yield successive n-sized chunks from l.
    """
    for i in xrange(0, len(l), n):
        yield l[i:i+n]

DNA = "ACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGC"
RNA = []

AMINO_ACIDS = {"UUU":"F", "UUC":"F", "UUA":"L", "UUG":"L",
    "UCU":"S", "UCC":"s", "UCA":"S", "UCG":"S",
    "UAU":"Y", "UAC":"Y", "UAA":"STOP", "UAG":"STOP",
    "UGU":"C", "UGC":"C", "UGA":"STOP", "UGG":"W",
    "CUU":"L", "CUC":"L", "CUA":"L", "CUG":"L",
    "CCU":"P", "CCC":"P", "CCA":"P", "CCG":"P",
    "CAU":"H", "CAC":"H", "CAA":"Q", "CAG":"Q",
    "CGU":"R", "CGC":"R", "CGA":"R", "CGG":"R",
    "AUU":"I", "AUC":"I", "AUA":"I", "AUG":"M",
    "ACU":"T", "ACC":"T", "ACA":"T", "ACG":"T",
    "AAU":"N", "AAC":"N", "AAA":"K", "AAG":"K",
    "AGU":"S", "AGC":"S", "AGA":"R", "AGG":"R",
    "GUU":"V", "GUC":"V", "GUA":"V", "GUG":"V",
    "GCU":"A", "GCC":"A", "GCA":"A", "GCG":"A",
    "GAU":"D", "GAC":"D", "GAA":"E", "GAG":"E",
    "GGU":"G", "GGC":"G", "GGA":"G", "GGG":"G",}

for char in DNA:
    if char == "G":
        RNA.append("C")
    elif char == "C":
        RNA.append("G")
    elif char == "A":
        RNA.append("U")
    elif char == "T":
        RNA.append("A")


translated = ''.join(AMINO_ACIDS[i] for i in chunks("".join(RNA), 3))

print("DNA sequence: " + DNA)
print()
print("Length of DNA sequence in base pairs: " + str(len(DNA)))
print()
print("RNA sequence of DNA sequence: " +("".join(RNA)))
print()
print("AMINO ACID sequence: " + str(translated))

Result:

DNA sequence: ACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGC
()
Length of DNA sequence in base pairs: 69
()
RNA sequence of DNA sequence: UGUUCUACGGUAACAGGGGGCCGGAGGACGACGACGACGAGAGGCCCCGGUGCCGGUGGCGACGGGACG
()
AMINO ACID sequence: CSTVTGGRRTTTTRGPGAGGDGT

A little more about your original error. I think you may be misunderstanding what RNA_2 = str(RNA) does. It doesn't mean "now and forever, RNA_2 will be the string version of RNA, and keep up-to-date whenever RNA changes". It means "Take the contents of RNA at this instant in time, turn it into a string, and that's what RNA_2 will be, even when RNA changes later". So RNA_2 will be "[]" even after you've appended values to RNA. This is the source of your KeyError. "[" is the first character of RNA_2, and "[" is not present in AMINO_ACIDS.

But even if you did RNA_2 = str(RNA) after you finished your appending loop, I don't think it would give you the result you would want. It would be ['U', 'G', 'U', 'U', 'C', ... rather than "UGUUC". If you want the latter, you ought to use "".join(RNA) rather than str(RNA).

But even if you use "".join(RNA), iterating through it and trying to access AMINO_ACIDS won't work, because AMINO_ACID's keys are all three characters long, and iterating over a string gives you one character at a a time. That's where chunk comes in, letting you iterate three characters at a time.

Community
  • 1
  • 1
Kevin
  • 74,910
  • 12
  • 133
  • 166