I want to compare two DNA sequences and return the identical nucleotides in a pair list (position in sequence 1, position in sequence 2)
input:
a = [G, T, T, U, I, P]
b = [E, G, T, P]
output:
[[0,1], [1,2], [2,2], [5,3]]
I want to compare two DNA sequences and return the identical nucleotides in a pair list (position in sequence 1, position in sequence 2)
input:
a = [G, T, T, U, I, P]
b = [E, G, T, P]
output:
[[0,1], [1,2], [2,2], [5,3]]
You can do it with for loops:
a_s = ["G", "T", "T", "U", "I", "P"]
b_s = ["E", "G", "T", "P"]
d = []
for i,a in enumerate(a_s):
for j,b in enumerate(b_s):
if a == b:
d.append([i,j])
print(d)
Out:
[[0, 1], [1, 2], [2, 2], [5, 3]]
Or you can do it in a single row:
a_s = ["G", "T", "T", "U", "I", "P"]
b_s = ["E", "G", "T", "P"]
print([[x, y] for x, av in enumerate(a_s) for y, bv in enumerate(b_s) if av == bv])
With the above, same output.
Note: The first version is in most case more readable, the second is more concise. You can always chose any of both depending on the code context and the purpose of it.
Two examples leveraging "product" from the "itertools" module.
The first is a traditional for loop that appends a list.
The second is a list comprehension equivalent.
from itertools import product
a = list('GTTUIP')
b = list('EGTP')
# Without a comprehension.
results = []
for (x, a_s), (y, b_s) in product(enumerate(a), enumerate(b)):
if a_s == b_s:
results.append([x, y])
print(results)
# With a comprehension
results = [[x, y]
for (x, a_s), (y, b_s)
in product(enumerate(a), enumerate(b))
if a_s == b_s]
print(results)
OUT:
[[0, 1], [1, 2], [2, 2], [5, 3]]
[[0, 1], [1, 2], [2, 2], [5, 3]]