6

I have to write a function that takes a string as argument and compair this string to two other strings and return the string most similar and the number of differences.

def func("LUMB"):
    lst=["JIBM", "NUNE", "NUMB"]
should return:
("NUMB",1)

I have tried:

def f(word):
    lst=["JIBM", "NUNE", "NUMB"]
    for i in lst:
        d=k(word, lst)
        return differences
        for n in d:
            print min(sum(n))

where:

def k(word1, word2):
    L=[]
    for w in range(len(word1)):
        if word1[w] != word2[w]:
            L.append(1)
        else:
            L.append(0)
    return L

so that i get a list of eg, [1,0,0,0] if word1="NUMB" and word2="LUMB"

talonmies
  • 70,661
  • 34
  • 192
  • 269
Linus Svendsson
  • 1,417
  • 6
  • 18
  • 21
  • 3
    Have you seen [Text difference algorithm](http://stackoverflow.com/questions/145607/text-difference-algorithm) and [Good Python modules for fuzzy string comparison](http://stackoverflow.com/questions/682367/good-python-modules-for-fuzzy-string-comparison) – Chris Dec 15 '11 at 11:21
  • A number of answers would be available on this link too http://stackoverflow.com/questions/682367/good-python-modules-for-fuzzy-string-comparison – Zain Khan Dec 15 '11 at 11:32
  • On the site there is a similar post. You would get some more valuable answers over here http://stackoverflow.com/questions/682367/good-python-modules-for-fuzzy-string-comparison – Zain Khan Dec 15 '11 at 11:33
  • Please don't vandalize your questions by deleting the code. – talonmies Dec 18 '11 at 13:13
  • @talonmies Has the code been restored after being deleted? – Anderson Green Oct 17 '12 at 00:35

2 Answers2

10

Looks like Shawn Chin has provided the best solution, but if you're prevented from using non-builtin modules, it seems like get_close_matches from difflib might help:

import difflib
difflib.get_close_matches("LUMB", ["JIBM", "NUNE", "NUMB"], 1)

The number of differences can be gotten using the get_opcodes method of SequenceMatcher and working with its return value.

Lauritz V. Thaulow
  • 49,139
  • 12
  • 73
  • 92
6

Using pylevenshtein to calculate Levenshtein distance:

>>> from Levenshtein import distance
>>> from operator import itemgetter
>>> lst = ["JIBM", "NUNE", "NUMB"]
>>> min([(x, distance("LUMB", x)) for x in lst], key=itemgetter(1))
('NUMB', 1)

Or, as a function:

from Levenshtein import distance
from operator import itemgetter
def closest(word, lst):
    return min([(x, distance(word, x)) for x in lst], key=itemgetter(1))

print closest("NUMB", ["JIBM", "NUNE", "NUMB"])

p.s. If you want to avoid additional dependencies, you could always implement your own function for calculating the distance. For example, several version are proposed in wikibooks each with their own pros and cons.

However, if performance is a concern, do consider sticking to the custom built modules. Apart from pylevenshtein, there's also python-levenshtein and nltk.metrics.distance (if you happen to already use NLTK).

Shawn Chin
  • 84,080
  • 19
  • 162
  • 191