I want to write a function that takes a long string of characters (a protein sequence like 'UGGUGUUAUUAAUGGUUU') and extracts three characters at a time from it (i.e. the codons). It can either return each set of three characters one after another, or a list containing all the sets of three characters. Either way would work. But I'm having some trouble figuring out exactly how to do this cleanly.
Here's what I have so far:
def get_codon_list(codon_string):
codon_start = 0
codon_length = 3
codon_end = 3
codon_list = []
for x in range(len(codon_string) // codon_length):
codon_list.append(codon_string[codon_start:codon_end])
codon_start += codon_length
codon_end += codon_length
return codon_list
It works to return a list of the codons, but it seems very inefficient. I don't like using hard-coded numbers and incrementing variables like that if there is a better way. I also don't like using for loops that don't actually use the variable in the loop. It doesn't seem like a proper use of it.
Any suggestions for how to improve this, either with a specific function/module, or just a better Pythonic technique?
Thanks!