1

I have some code to find the differences between strings. At the moment it works for strings of the same length, but I am trying to get it to work for strings of different length. How can I do this

I added in a new variable longest_seq to try and work around this but I'm not sure how to use it.

ref_seq = "pandabears"
map_seq = "pondabear"
longest_seq = map_seq

if len(ref_seq) > len(map_seq):
    longest_seq == ref_seq


for i in range(len(longest_seq)):
    if ref_seq[i] != map_seq[i]: 
        print i, ref_seq[i], map_seq[i]
wilberox
  • 193
  • 10
  • Can you be more clear about what it means to find "the differences" between two strings? One common way of computing this is the edit distance: https://en.wikipedia.org/wiki/Edit_distance. Your current code looks like it is printing characters if they differ at a specific spot. – kingkupps Sep 12 '19 at 22:47
  • @kingkupps Printing characters that differed at a specific spot is what I am trying to do. Apologies for confusion. – wilberox Sep 12 '19 at 23:09

2 Answers2

2

For Python 2, you can use itertools.izip for this:

from itertools import izip

for i, j in izip(ref_seq, map_seq):
    if i != j: 
        print i, j

Output:

a o

In Python 3, you can use the built-in zip function:

for i, j in zip(ref_seq, map_seq):
    if i != j: 
        print(i, j)

zip exists in Python 2, but itertools.izip is recommended because it generates the tuples at demand (in every iteration it generates a new tuple) rather than building all of them at once, in Python 3, zip does what itertools.izip does in Python 2.

DjaouadNM
  • 22,013
  • 4
  • 33
  • 55
  • how can I edit my code to make it work on different length strings though? – wilberox Sep 12 '19 at 23:10
  • @wilberox What should happen for different length strings? `zip` will take care of the _taking the shortest length string_ problem. – DjaouadNM Sep 12 '19 at 23:16
0

Something like this should do the trick.

def different_characters(reference, target):
    # So we don't accidentally index the shorter string past its ending
    ending = min(len(reference), len(target))

    for i in range(ending):
        if reference[i] != target[i]:
            print(i, reference[i], target[i])

    longer_str = reference if len(reference) > len(target) else target
    for i in range(ending, len(longer_str)):
        print(i, longer_str[i], '<empty>')


different_characters('pandabears', 'pondabear')

Which would print:

1 a o
9 s <empty>
kingkupps
  • 3,284
  • 2
  • 16
  • 28