25

I have two strings of equal length, how can I find all the locations where the strings are different?

For example, "HELPMEPLZ" and "HELPNEPLX" are different at positions 4 and 8.

Moshe
  • 9,283
  • 4
  • 29
  • 38
Linus Svendsson
  • 1,417
  • 6
  • 18
  • 21

7 Answers7

33

Try this:

s1 = 'HELPMEPLZ'
s2 = 'HELPNEPLX'
[i for i in xrange(len(s1)) if s1[i] != s2[i]]

It will return:

> [4, 8]

The above solution will return a list with the indexes in sorted order, won't create any unnecessary intermediate data structures and it will work on Python 2.3 - 2.7. For Python 3.x replace xrange for range.

Óscar López
  • 232,561
  • 37
  • 312
  • 386
23

Python really comes with batteries included. Have a look at difflib

>>> import difflib
>>> a='HELPMEPLZ'
>>> b='HELPNEPLX'
>>> s = difflib.SequenceMatcher(None, a, b)
>>> for block in s.get_matching_blocks():
...     print block
Match(a=0, b=0, size=4)
Match(a=5, b=5, size=3)
Match(a=9, b=9, size=0)

difflib is very powerful and a some study of the documentation is really recommended.

Fredrik Pihl
  • 44,604
  • 7
  • 83
  • 130
6
>>> from itertools import izip
>>> s1 = 'HELPMEPLZ'
>>> s2 = 'HELPNEPLX'
>>> [i for i,(a1,a2)  in enumerate(izip(s1,s2)) if a1!=a2]
[4, 8]
ovgolovin
  • 13,063
  • 6
  • 47
  • 78
2

Building on the direction pointed by @FredrikPihl, here you have a solution that is also able to detect insertions/deletions using a module in the Python Standard Library:

import difflib
a = 'HELPMEPLZ'
b = 'HELPNEPLX'
s = difflib.SequenceMatcher(None, a, b, autojunk=False)
for tag, i1, i2, j1, j2 in s.get_opcodes():
    if tag != 'equal':
        print('{:7}   a[{}:{}] --> b[{}:{}] {!r:>8} --> {!r}'.format(
            tag, i1, i2, j1, j2, a[i1:i2], b[j1:j2]))

With the output:

replace   a[4:5] --> b[4:5]      'M' --> 'N'
replace   a[8:9] --> b[8:9]      'Z' --> 'X'

Let's see how it works with a similar example including deletions and additions:

a = 'HELPMEPLZ'
b = 'HLP NEE PLX'

With the output:

delete    a[1:2] --> b[1:1]      'E' --> ''
replace   a[4:5] --> b[3:5]      'M' --> ' N'
insert    a[6:6] --> b[6:8]       '' --> 'E '
replace   a[8:9] --> b[10:11]      'Z' --> 'X'
khyox
  • 1,276
  • 1
  • 20
  • 22
2

If you store the two strings in a and b, you can loop through all the items and check for inequality.

python interactive interpreter:

>>> for i in range(len(a)):
...   if a[i] != b[i]: print i, a[i], b[i]
... 
4 M N
8 Z X

Another way to do this is with list comprehensions. It's all in one line, and the output is a list.

>>> [i for i in range(len(a)) if a[i] != b[i]]
[4, 8]

That makes it really easy to wrap into a function, which makes calling it on a variety of inputs easy.

>>> def dif(a, b):
...     return [i for i in range(len(a)) if a[i] != b[i]]
...
>>> dif('HELPMEPLZ', 'HELPNEPLX')
[4, 8]
>>> dif('stackoverflow', 'stacklavaflow')
[5, 6, 7, 8]
Brigand
  • 84,529
  • 20
  • 165
  • 173
  • Don't use lambda when you don't have to: `def dif(a, b): return [i for i in range(len(a)) if a[i] != b[i]]` – Steven Rumbalski Dec 17 '11 at 15:30
  • 1
    @StevenRumbalski, why not? It does the exact same thing... as far as I know. Is it just less Pythonic? Or slower? – Brigand Dec 17 '11 at 15:41
  • 2
    Lambdas are for *anonymous* functions and excellent for creating one liners. Here, you assign a name to the lambda, which suggests that it would be better as a regular function. Also, lambdas obscure your traceback when you have a bug.. – Steven Rumbalski Dec 17 '11 at 17:08
2

Pair up the strings character-by-character and iterate over this collection together with a counting index. Test whether the characters in each pair differ; if they do, output the index of where.

Using Python builtin functions you can do this neatly in one line:

>>> x = 'HELPMEPLZ'
>>> y = 'HELPNEPLX'
>>> {i for i, (left, right) in enumerate(zip(x,y)) if left != right}
{8, 4}
Katriel
  • 120,462
  • 19
  • 136
  • 170
  • Although the OP didn't say so, I'd be willing to bet that a solution that produced the errant indices in ascending order would be preferable. – John Machin Dec 17 '11 at 20:55
  • @Linus: if you want this, change the `{}` brackets to `[]`. That will give you a list comprehension instead of a set comprehension, so preserves the order. – Katriel Dec 17 '11 at 21:16
0

Easiest way is to split data into two char arrays and then loop through comparing the letters and return the index when the two chars do not equal each other.

This method will work fine as long as both strings are equal in length.

Dobbo1989
  • 303
  • 6
  • 9