python search technology: word similarity

Question

I want to get a similarity percentage of two words, eg)

abcd versus zzabcdzz == 50% similarity

Don't need to be very accurate. Is there any way to do that? I am using python but feel free to recomment other languages.

possible duplicate of [Text difference algorithm](http://stackoverflow.com/questions/145607/text-difference-algorithm) — tzot, Feb 12 '11 at 12:04

Mark Byers · Accepted Answer · 2011-02-12T06:11:21.413

Try using python-Levenshtein to calculate the edit distance.

The Levenshtein Python C extension module contains functions for fast computation of

Levenshtein (edit) distance, and edit operations

string similarity

approximate median strings, and generally string averaging

string sequence and set similarity

You can get a rough idea of similarity by calculating the edit distance between the two strings divided by the length of the longest string. In your example the edit distance is 4, and the maximum possible edit distance is 8, so the similarity is 50%.

score 3 · Answer 2 · answered Feb 12 '11 at 06:34

3

You could use the python inbuilt module difflib

Here's an example from that page

>>> s = SequenceMatcher(None, "abcd", "bcde")
>>> s.ratio()
0.75

answered Feb 12 '11 at 06:34

TigrisC

1,320
9
11

score 1 · Answer 3 · answered Feb 12 '11 at 06:25

1

some similarity metrics from nltk library:

http://www.opendocs.net/nltk/0.9.5/api/nltk.wordnet.similarity-module.html

answered Feb 12 '11 at 06:25

Asterisk

3,534
2
34
53

score 0 · Answer 4 · edited May 23 '17 at 11:52

0

Copying from that answer:

In Python, there is difflib.

difflib offers the SequenceMatcher class, which can be used to give you a similarity ratio. Example function:

def text_compare(text1, text2, isjunk=None):
    return difflib.SequenceMatcher(isjunk, text1, text2).ratio()

edited May 23 '17 at 11:52

Community

1
1

answered Feb 12 '11 at 12:03

tzot

92,761
29
141
204

python search technology: word similarity

4 Answers4