I need help in making this program take a DNA sequence and break it up into 3s(ATGCGTGGC=>ATG,CGT,CCG) to create codons. Then from there it will compare that codon to the 'genecode' dictionary in my code. It seems to work just fine until it gets to the end of the last line in Amino_A where this Keyword error pops up:
Traceback (most recent call last):
File "main.py", line 101, in <module>
Amino_A=genecode[codon_1]
KeyError: ''
And I'm wondering what I am doing wrong? I used Line Interpretations to eliminate new lines ('\n') and replace them with a space(''), but when it arrives to the end of the file i'm guessing it leaves a ('') there and the above error pops up. What do I do? Thanks for any help!
genecode = {
'ATA':'I', 'ATC':'I', 'ATT':'I', 'ATG':'M',
'ACA':'T', 'ACC':'T', 'ACG':'T', 'ACT':'T',
'AAC':'N', 'AAT':'N', 'AAA':'K', 'AAG':'K',
'AGC':'S', 'AGT':'S', 'AGA':'R', 'AGG':'R',
'CTA':'L', 'CTC':'L', 'CTG':'L', 'CTT':'L',
'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCT':'P',
'CAC':'H', 'CAT':'H', 'CAA':'Q', 'CAG':'Q',
'CGA':'R', 'CGC':'R', 'CGG':'R', 'CGT':'R',
'GTA':'V', 'GTC':'V', 'GTG':'V', 'GTT':'V',
'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCT':'A',
'GAC':'D', 'GAT':'D', 'GAA':'E', 'GAG':'Glu',
'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGT':'G',
'TCA':'S', 'TCC':'S', 'TCG':'S', 'TCT':'S',
'TTC':'F', 'TTT':'F', 'TTA':'L', 'TTG':'L',
'TAC':'Y', 'TAT':'Y', 'TAA':'_', 'TAG':'_',
'TGC':'C', 'TGT':'C', 'TGA':'_', 'TGG':'W'}
Gene1=open("HBB Norm.csv", "r")
Gene2=open("HBB Pos (Sickle Cell).csv", "r")
##HBB Norm and HBB Pos (Sickle Cell) are just the names of the csv files I want to import data from
Gene_1=Gene1.read()
Gene_11=[Gene_1.replace('\n','') for gene_1 in Gene_1]
Gene1.close()
Gene_2=Gene2.read()
Gene_22=[Gene_2.replace('\n','') for gene_2 in Gene_2]
Gene2.close()
AA_diff=[]
for i in range(len(Gene_11)):
Gene_112=Gene_11[i]
Gene_113=Gene_22[i]
for codon in range(0,len(Gene_11),3):
codon_1=Gene_112[codon:codon+3]
Amino_A=genecode[codon_1]
codon_2=Gene_113[codon:codon+3]
Amino_B=genecode[codon_2]
if Amino_A!=Amino_B: #Trying get a dash line btwn diff Amino Acids
AA_diff.append(Amino_B)
print(Amino_A,'-',Amino_B)