How to compare two DNA sequences and return the identical nucleotides in a pair list

Question

I want to compare two DNA sequences and return the identical nucleotides in a pair list (position in sequence 1, position in sequence 2)

input:

a = [G, T, T, U, I, P]
b = [E, G, T, P]

output:

[[0,1], [1,2], [2,2], [5,3]]

Are you after *all* pairs? So if you had `a=['T', 'T', 'T']; b = ['T', 'T', 'T']` you'd have 9 results? — Jon Clements, Nov 11 '18 at 23:49
Did you write any code for this? You need to share the code and explain what exact issue you are facing in that — Chetan, Nov 11 '18 at 23:50

Geeocode · Accepted Answer · 2018-11-11T23:58:31.910

1

You can do it with for loops:

a_s = ["G", "T", "T", "U", "I", "P"]
b_s = ["E", "G", "T", "P"]

d = []
for i,a in  enumerate(a_s):
    for j,b in enumerate(b_s):
        if a == b:
            d.append([i,j])
print(d)

Out:

[[0, 1], [1, 2], [2, 2], [5, 3]]

Or you can do it in a single row:

a_s = ["G", "T", "T", "U", "I", "P"]
b_s = ["E", "G", "T", "P"]    

print([[x, y] for x, av in enumerate(a_s) for y, bv in enumerate(b_s) if av == bv])

With the above, same output.

Note: The first version is in most case more readable, the second is more concise. You can always chose any of both depending on the code context and the purpose of it.

edited Nov 11 '18 at 23:58

answered Nov 11 '18 at 23:52

Geeocode

5,705
3
20
34

2

Or rolled into a list-comp: `[[x, y] for x, av in enumerate(a) for y, bv in enumerate(b) if av == bv]` – Jon Clements Nov 11 '18 at 23:52
@JonClements Sometimes more readable the open format, but can show him as well – Geeocode Nov 11 '18 at 23:54
Indeed... if you're starting out it's definitely more readable. However, it's useful for learning and for others in the future that come across this post to see the other version - doesn't hurt to show both – Jon Clements Nov 11 '18 at 23:57
@JonClements See https://stackoverflow.com/questions/899103/writing-a-list-to-a-file-with-python – Geeocode Nov 12 '18 at 02:38
@JonClements Thank Jon! – Geeocode Nov 12 '18 at 03:36

score 0 · Answer 2 · edited Jun 20 '20 at 09:12

0

Two examples leveraging "product" from the "itertools" module.

The first is a traditional for loop that appends a list.

The second is a list comprehension equivalent.

from itertools import product

a = list('GTTUIP')
b = list('EGTP')

# Without a comprehension.
results = []
for (x, a_s), (y, b_s) in product(enumerate(a), enumerate(b)):
    if a_s == b_s:
        results.append([x, y])
print(results)
 
# With a comprehension
results = [[x, y]
          for (x, a_s), (y, b_s) 
          in product(enumerate(a), enumerate(b)) 
          if a_s == b_s]
print(results)

OUT:

[[0, 1], [1, 2], [2, 2], [5, 3]]

[[0, 1], [1, 2], [2, 2], [5, 3]]

edited Jun 20 '20 at 09:12

Community

1
1

answered Nov 12 '18 at 00:12

dmmfll

2,666
2
35
41

For what it's worth, using `itertools.product` is roughly 10% faster. See this gist using the `timeit` module. Each comprehension is run 1 million times: https://gist.github.com/bb4f02e391c7e15df947df6918e7bd93 – dmmfll Nov 12 '18 at 10:56

How to compare two DNA sequences and return the identical nucleotides in a pair list

2 Answers2