2

I have created a mutate_v1 function that generates random mutations in a DNA sequence.

def mutate_v1(sequence, mutation_rate):
    dna_list = list(sequence)
    for i in range(len(sequence)):
        r = random.random()
        if r < mutation_rate:
            mutation_site = random.randint(0, len(dna_list) - 1)
            dna_list[mutation_site] = random.choice(list('ATCG'))
        return ''.join(dna_list)

If I apply my function to all elements of G0 I get a new generation (G1) of mutants (a list of mutated sequences).

G0 = ['CTGAA', 'CTGAA', 'CTGAA', 'CTGAA', 'CTGAA']

G1 = [mutate_v1(s,0.01) for s in G0]

#G1
['CTGAA', 'CTGAA', 'CTGAA', 'CTGAA', 'CTGAA']

How can I repeat my function up to G20 (20 generations)?

I can do it manually like the following

G1   = [mutate_v1(s,0.01) for s in G0]
G2   = [mutate_v1(s,0.01) for s in G1]
G3   = [mutate_v1(s,0.01) for s in G2]
G4   = [mutate_v1(s,0.01) for s in G3]
G5   = [mutate_v1(s,0.01) for s in G4]
G6   = [mutate_v1(s,0.01) for s in G5]
G7   = [mutate_v1(s,0.01) for s in G6]

But I'm sure a for loop would be better. I have tested several codes but without results.

Some one can help please?

2 Answers2

2

Use range to iterate up to the number of generations, and store each generation in a list, each generation is the result of mutating the previous one:

G0 = ['CTGAA', 'CTGAA', 'CTGAA', 'CTGAA', 'CTGAA']

generations = [G0]
for _ in range(20):
    previous_generation = generations[-1]
    generations.append([mutate_v1(s, 0.01) for s in previous_generation])

# then you can access by index to a generation
print(generations[1])  # access generation 1
print(generations[20]) # access generation 20

Output

['CTGAA', 'CTGAA', 'CTGAA', 'CTGAA', 'CTGAA']
['CTGAA', 'CTGAA', 'CTGAA', 'CTGAA', 'CTGAT']
Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
  • Good opportunity here for OP to learn about generators. – ddejohn Oct 15 '21 at 17:24
  • @ddejohn I don't see how this could be easily translate into a generator as each generation depends on the previous one. One solution could be to use accumulate, but that is all that comes to my mind – Dani Mesejo Oct 15 '21 at 17:27
  • Do you mind if I add an answer that involves generators, then? Your solution is great, but I think this is a good learning opportunity for OP. – ddejohn Oct 15 '21 at 17:28
  • @ddejohn No problem, feel free. – Dani Mesejo Oct 15 '21 at 17:28
  • @DaniMesejo to respond to your initial skepticism, a generator is actually a textbook use-case for sequences which depend on their most recent state (the "hello world" of generators is generating the naturals via incrementing the previous value)! See my answer for implementation details (tl;dr, `yield g` and then reassign `g` to its mutation). – ddejohn Oct 15 '21 at 18:10
1

Dani's answer is a nice simple solution, but I wanted to demonstrate another approach using a slightly more advanced programming technique in Python, generator functions:

def mutation_generator(g0):
    g = g0.copy()
    while True:
        yield g
        g = [mutate_v1(seq, 0.01) for seq in g]

Right now, mutation_generator is an infinite sequence generator, meaning that you could theoretically continue evolving your sequence indefinitely. If you want to grab 20 generations:

g0 = ['CTGAA', 'CTGAA', 'CTGAA', 'CTGAA', 'CTGAA']
generation = mutation_generator(g0)
twenty_generations = [next(generation) for _ in range(20)]

The nice thing about this generator is that we can start it back up where it left off at any point. Say you've done some analysis on the first twenty generations, and now you want to see what happens over the next hundred:

next_hundred = [next(generation) for _ in range(100)]

Now, we could've initialized a new generator, using the last generation from twenty_generations as the initial generation of the new generator, but that's not necessary, since our generation generator simply left off at 20 generations and is ready to go on mutating whenever you call next(generation).

This opens up a LOT of possibilities, including sending new mutation rate parameters, or even, if you want, entirely new mutation functions. Really, anything you want.

The other benefit here is that you can run multiple generators on the same initial sequence and observe how they diverge. Note this is totally possible with the more traditional approach of using a for loop in a function, but the benefit of using the generators is that you don't have to generate an entire sequence at once; it only mutates when you tell it to (via next()). For example:

g0 = ['CTGAA', 'CTGAA', 'CTGAA', 'CTGAA', 'CTGAA']
universe_1 = mutation_generator(g0)
universe_2 = mutation_generator(g0)
universe_3 = mutation_generator(g0)

# The first generation is always the same as g0, but this can be modified if you desire
next(universe_1)
next(universe_2)
next(universe_3)

# Compare the first mutation without having to calculate twenty generations in each 'universe' before getting back results
first_mutation_u1 = next(universe_1)
first_mutation_u2 = next(universe_2)
first_mutation_u3 = next(universe_3)

Again, you can also modify the generator function mutation_generator to accept other parameters, like custom mutation functions, or even make it possible to change the mutation rate at any time, etc.

Finally, just as a side note, using a generator makes it very easy to skip thousands of generations without needing to store more than one sequence in memory:

g0 = ['CTGAA', 'CTGAA', 'CTGAA', 'CTGAA', 'CTGAA']
generation = mutation_generator(g0)
for _ in range(10000):
    next(generation)

print(g0)  # first gen
print(next(generation))  # ten thousand generations later

Output:

['CTGAA', 'CTGAA', 'CTGAA', 'CTGAA', 'CTGAA']
['TTGGA', 'CTTCG', 'TGTGA', 'TAACA', 'CATCG']

With a for loop-based approach, you would've had to either create and store all 10000 generations (wasting a lot of memory), or modify the code in Dani's answer to behave more like a generator (but without the benefits!).

Real Python has a good article on generators if you want to learn more. And of course, check out the docs as well.

ddejohn
  • 8,775
  • 3
  • 17
  • 30