Dani's answer is a nice simple solution, but I wanted to demonstrate another approach using a slightly more advanced programming technique in Python, generator functions:
def mutation_generator(g0):
g = g0.copy()
while True:
yield g
g = [mutate_v1(seq, 0.01) for seq in g]
Right now, mutation_generator
is an infinite sequence generator, meaning that you could theoretically continue evolving your sequence indefinitely. If you want to grab 20 generations:
g0 = ['CTGAA', 'CTGAA', 'CTGAA', 'CTGAA', 'CTGAA']
generation = mutation_generator(g0)
twenty_generations = [next(generation) for _ in range(20)]
The nice thing about this generator is that we can start it back up where it left off at any point. Say you've done some analysis on the first twenty generations, and now you want to see what happens over the next hundred:
next_hundred = [next(generation) for _ in range(100)]
Now, we could've initialized a new generator, using the last generation from twenty_generations
as the initial generation of the new generator, but that's not necessary, since our generation
generator simply left off at 20 generations and is ready to go on mutating whenever you call next(generation)
.
This opens up a LOT of possibilities, including sending new mutation rate parameters, or even, if you want, entirely new mutation functions. Really, anything you want.
The other benefit here is that you can run multiple generators on the same initial sequence and observe how they diverge. Note this is totally possible with the more traditional approach of using a for
loop in a function, but the benefit of using the generators is that you don't have to generate an entire sequence at once; it only mutates when you tell it to (via next()
). For example:
g0 = ['CTGAA', 'CTGAA', 'CTGAA', 'CTGAA', 'CTGAA']
universe_1 = mutation_generator(g0)
universe_2 = mutation_generator(g0)
universe_3 = mutation_generator(g0)
# The first generation is always the same as g0, but this can be modified if you desire
next(universe_1)
next(universe_2)
next(universe_3)
# Compare the first mutation without having to calculate twenty generations in each 'universe' before getting back results
first_mutation_u1 = next(universe_1)
first_mutation_u2 = next(universe_2)
first_mutation_u3 = next(universe_3)
Again, you can also modify the generator function mutation_generator
to accept other parameters, like custom mutation functions, or even make it possible to change the mutation rate at any time, etc.
Finally, just as a side note, using a generator makes it very easy to skip thousands of generations without needing to store more than one sequence in memory:
g0 = ['CTGAA', 'CTGAA', 'CTGAA', 'CTGAA', 'CTGAA']
generation = mutation_generator(g0)
for _ in range(10000):
next(generation)
print(g0) # first gen
print(next(generation)) # ten thousand generations later
Output:
['CTGAA', 'CTGAA', 'CTGAA', 'CTGAA', 'CTGAA']
['TTGGA', 'CTTCG', 'TGTGA', 'TAACA', 'CATCG']
With a for
loop-based approach, you would've had to either create and store all 10000 generations (wasting a lot of memory), or modify the code in Dani's answer to behave more like a generator (but without the benefits!).
Real Python has a good article on generators if you want to learn more. And of course, check out the docs as well.