Compare strings, allowing one character difference

Question

I have been trying to implement tabular method for simplification of boolean expressions in python. For that I need to check whether two given strings differ at only one index for example, the function should return the following for the following examples:

0011 and 0111 - true as the two differ only at position 1
0-001 and 0-101 - true as differ at only 2
0-011 and 0-101 - false as differ at 2,3

right now I am using the following function:

def match(s1,s2):

    l=[False,-1]##returns false when they cant be combined
    for i in range(len(s1)):
        if s1[:i]==s2[:i] and s1[i]!=s2[i] and s1[i+1:]==s2[i+1:]:
            l= [True,i]
            break
    return l

I want to implement it in a very fast manner (low complexity). Is there a way to do so in python?

i mean that the two strings should differ at exactly one position — Deepak Saini, Aug 09 '14 at 07:58
possible duplicate of [Good Python modules for fuzzy string comparison?](http://stackoverflow.com/questions/682367/good-python-modules-for-fuzzy-string-comparison) — Back2Basics, Aug 09 '14 at 08:11

score 12 · Answer 1 · edited Dec 17 '22 at 20:45

12

I used the Levenshtein distance to match strings that are different by one character added or removed only (and not simply replaced). So:

John's dog
Johns dog

are considered as a match.

The Levenshtein distance is the number of edits it would take to replace one string by another. The Levenshtein distance of the two above strings is 1, because the only thing to do is to remove one character.

To use it, you can simply install the according Levenshtein python module:

pip install levenshtein

and then use it:

from Levenshtein import distance 
def match(s1, s2):
    return distance(s1, s2) <= 1

edited Dec 17 '22 at 20:45

Apollo-Roboto

623
5
21

answered May 08 '16 at 16:20

beledouxdenis

171
2
4

The readme of that package has a warning saying that it's been renamed to simply `levenshtein`. The code is still the same. – Apollo-Roboto Dec 16 '22 at 21:30

Marco Sulla · Accepted Answer · 2014-08-09T09:21:50.270

This is a more well-performing solution, coded in Python 3:

def match(s1, s2):
    ok = False

    for c1, c2 in zip(s1, s2):
        if c1 != c2:
            if ok:
                return False
            else:
                ok = True

    return ok

I do not checked for length difference because you said the two strings are equal, but for a more general approach I would add it.

If you need the position of the different character:

def match(s1, s2):
    pos = -1

    for i, (c1, c2) in enumerate(zip(s1, s2)):
        if c1 != c2:
            if pos != -1:
                return -1
            else:
                pos = i

    return pos

These are benchmarks performed with timeit, tested with match("0-001", "0-101"). I translated all solutions to py3 and removed length test.

your solution: 5.12
Martijn Pieters' solution: 4.92
enrico.bacis' and lakesh's solution: 5.51
my solution: 2.42

Tests with a longer string:

Martijn Pieters' solution:

timeit.timeit('match("0-0016ub5j2oi06u30tj30g6790v3nug[hoyj39867i6gy9thvb05y4b896y3n098vty98thn98qg5y4n8ygnqp", "0-0016ub5j2oi06u30tj30g6790v3gug[hoyj39867i6gy9thvb05y4b896y3n098vty98thn98qg5y4n8ygnqp")', setup="""
def match(s1, s2):
    combo = zip(s1, s2)
    return any(c1 != c2 for c1, c2 in combo) and all(c1 == c2 for c1, c2 in combo)
""")

result: 32.82

My solution:

timeit.timeit('match("0-0016ub5j2oi06u30tj30g6790v3nug[hoyj39867i6gy9thvb05y4b896y3n098vty98thn98qg5y4n8ygnqp", "0-0016ub5j2oi06u30tj30g6790v3gug[hoyj39867i6gy9thvb05y4b896y3n098vty98thn98qg5y4n8ygnqp")', setup="""
def match(s1, s2):
    ok = False

    for c1, c2 in zip(s1, s2):
        if c1 != c2:
            if ok:
                return False
            else:
                ok = True

    return ok
""")

Result: 20.21

Instead of setting `ok` to False and breaking, you can just `return False` at that point. — Martijn Pieters, Aug 09 '14 at 08:36
Can you update this with benchmarks including all test inputs and also *longer* strings? — Martijn Pieters, Aug 09 '14 at 08:37
@lucas its a very efficient way.Is there a way ,your function can be modified to give position(index) of difference also.Thanks a lot. — Deepak Saini, Aug 09 '14 at 09:04
@DeepakSaini: you can do: `pos=-1; for i, cc in enumerate(zip(s1, s2)): if cc[0] != cc[1]: if pos != -1: return -1; else: pos =i` and `return pos`. You don't need to return also `True` or `False` if you need the position, because `-1` means there's no match. — Marco Sulla, Aug 09 '14 at 09:13
@lucas thanks a lot.I have been stuck into the problem for quite sometime — Deepak Saini, Aug 09 '14 at 09:23

Martijn Pieters · Answer 3 · 2014-08-09T08:11:22.257

4

In Python 2, use future_builtins.zip() (Python 2 and 3 compatible zip()-as-iterator) (in Python 3 the built-in will do fine) to combine the two strings character by character, then use any() and all() to loop over the resulting iterator:

try:
    from future_builtins import zip
except ImportError:
    pass

def match(s1, s2):
    if len(s1) != len(s2):
        return False
    combo = zip(s1, s2)
    return any(c1 != c2 for c1, c2 in combo) and all(c1 == c2 for c1, c2 in combo)

This works because combo is an iterator; it yields pairs of characters one by one on demand, and any() and all() only take as many pairs as needed to determine their outcome.

any() stops iterating as soon as it finds two characters that are not equal. all() will only return True if all the remaining characters are equal.

Together the two conditions then are only True if there is exactly one pair that differs.

Because an iterator approach is used, the above does the absolute minimum amount of work to determine if your strings are a match; the moment a second pair is found that doesn't match iteration stops; there is no point in looking at the rest of the character combinations.

Demo (Python 2, so zip() is imported):

>>> from future_builtins import zip
>>> def match(s1, s2):
...     if len(s1) != len(s2):
...         return False
...     combo = zip(s1, s2)
...     return any(c1 != c2 for c1, c2 in combo) and all(c1 == c2 for c1, c2 in combo)
... 
>>> match('0011', '0111')
True
>>> match('0-001', '0-101')
True
>>> match('0-011', '0-101')
False

edited Aug 09 '14 at 08:11

answered Aug 09 '14 at 07:40

Martijn Pieters

1,048,767
296
4,058
3,343

@ Matrtijn Pieters thanks.But iam getting an error that there is no module named future_builtins – Deepak Saini Aug 09 '14 at 07:50
@DeepakSaini: What version of Python is this? You can also use `from itertools import izip as zip`. – Martijn Pieters Aug 09 '14 at 07:51
i am using python 3.3.2 – Deepak Saini Aug 09 '14 at 07:52
@DeepakSaini: right, then `zip()` is already the 'future builtin'; adjusted to ignore the import error in that case. – Martijn Pieters Aug 09 '14 at 07:54
i dont get it.i can see that the method you have given will do my work .But i am not able to import the library. – Deepak Saini Aug 09 '14 at 08:01
@DeepakSaini: The library is *only for Python 2*. You can ignore it in Python 3. You didn't state a version, so I built my answer for both. – Martijn Pieters Aug 09 '14 at 08:04
1

Very cool use of zip as an iterator. However, it looks like it will return false in the event that the strings are equal -- the any expression will return false and so the whole logical expression returns false. – Dunes Aug 09 '14 at 08:24
@Dunes: In that case it *should* return False, as in that case the strings do *not* differ by 1 character. – Martijn Pieters Aug 09 '14 at 08:25
That's what I meant. `False and blah` is `False`. Thus, `match("1", "1")` returns false, and I would expect it to return true. Just seen that you've explicitly stated that that is the case. Just that it doesn't completely answer the given question. Sorry for the confusion. – Dunes Aug 09 '14 at 08:30
1

@Dunes: Nowhere in the question is it stated that True should be returned if the strings do not differ at all. – Martijn Pieters Aug 09 '14 at 08:33
1

That was my interpretation of the title when it said "allowing one", ie. permitting at most one. But the rest of the question favours your interpretation. I've realised that converting `any(...) and all(...)` to `any(...) or all(...)` would answer my interpretation of the question. – Dunes Aug 09 '14 at 08:38
@Dunes: right; in that case you'd get True for **at most** 1 difference indeed. – Martijn Pieters Aug 09 '14 at 08:47

score 3 · Answer 4 · edited Aug 09 '14 at 08:07

3

def match(a,b):
    s = sum([a[i] != b[i] for i in range(len(a))])
    if s == 1:
       return True
    else:
       return False

edited Aug 09 '14 at 08:07

Artjom B.

61,146
24
125
222

answered Aug 09 '14 at 07:44

lakshmen

28,346
66
178
276

1

Or you could `return s<2` – spicavigo Aug 09 '14 at 09:02

enrico.bacis · Answer 5 · 2014-08-09T07:55:21.003

Same Lengths

If the two string have the same length:

def match(s1, s2):
    return sum(s1[i] != s2[i] for i in xrange(len(s1))) <= 1

You can also create a generator of matchers like this:

def create_matcher(maxdiff):
    return lambda s1, s2: sum(s1[i] != s2[i] for i in xrange(len(s1))) <= maxdiff

match = create_matcher(1)

Example:

print match('0011', '0111')      # True
print match('0-001', '0-101')    # True
print match('0-011', '0-101')    # False

Different Lengths

If they have different length we can assume that the exceeding characters are different, so:

def match(s1, s2):
    l = min(len(s1), len(s2))
    return sum(s1[i] != s2[i] for i in xrange(l)) + abs(len(s1) - len(s2)) <= 1

Compare strings, allowing one character difference

5 Answers5

Same Lengths

Different Lengths

Linked