I have a large set of DNA sequences 1.5 million each having around 1k characters from the set ATCG
I am simulating error mutations which is taking a lot of time to finish. I have identified my bottleneck which is the function that changes he characters of the string:
def f(sequence, indexes_to_mutate):
seq = list(sequence)
for i in indexes_to_mutate:
seq[i] = 'X'
return ''.join(seq)
Is there a faster way to operate on the string without having to convert to list then back to string.