4

I am trying to match two strings sequentially till the first the non-matched character and then determine the percentage exact match. My code is like this:

def match(a, b):
    a, b = list(a), list(b)
    count = 0
    for i in range(len(a)):
        if (a[i]!= b[i]): break
        else: count = count + 1
    return count/len(a)

a = '354575368987943'
b = '354535368987000'
c = '354575368987000'
print(match(a,b)) # return 0.267
print(match(a,c)) # return 0.8

Is there any built-in method in python already which can do it faster ? For simplicity assume that both strings are of same length.

muazfaiz
  • 4,611
  • 14
  • 50
  • 88
  • The closest thing to this is `difflib`'s [`SequenceMatcher.get_matching_blocks`](https://docs.python.org/2/library/difflib.html#difflib.SequenceMatcher.get_matching_blocks) : http://ideone.com/wlUVd9 – Ashwini Chaudhary Jul 17 '17 at 21:42
  • strings can be manipulated as lists, there is no need to `list()` them. – TemporalWolf Jul 17 '17 at 21:43
  • Best answer was already provided in the comment to https://stackoverflow.com/questions/18715688/find-common-substring-between-two-strings – AGN Gazer Jul 17 '17 at 21:50
  • Also check out https://codereview.stackexchange.com/questions/124144/finding-the-fastest-common-prefix-of-2-strings-in-python – AGN Gazer Jul 17 '17 at 21:53

3 Answers3

7

There's no built-in to do the entire thing, but you can use a built-in for computing the common prefix:

import os
def match(a, b):
    common = os.path.commonprefix([a, b])
    return float(len(common))/len(a)    
bogdanciobanu
  • 1,603
  • 1
  • 12
  • 14
4

I don't think there is such build-in method.

But you can improve your implementation:

  • No need to wrap the inputs in list(...). Strings are indexable.
  • No need for count variable, i already carries the same meaning. And you can return immediately when you know the result.

Like this, with some doctests added as a bonus:

def match(a, b):
    """
    >>> match('354575368987943', '354535368987000')
    0.26666666666666666

    >>> match('354575368987943', '354575368987000')
    0.8

    >>> match('354575368987943', '354575368987943')
    1
    """
    for i in range(len(a)):
        if a[i] != b[i]:
            return i / len(a)

    return 1
janos
  • 120,954
  • 29
  • 226
  • 236
0

alternative

(Just now saw that the answer below me thought of the same thing while I was editing the post)

def match(l1, l2):
    # find mismatch
    try:
        stop = next(i for i, (el1, el2) in enumerate(zip(l1, l2)) if el1 != el2)
        return stop/len(l1)
    except StopIteration:
        return 1
Anton vBR
  • 18,287
  • 5
  • 40
  • 46