fuzzywuzzy is a very popular library for string matching. As per the documentation of the library, it is mentioned that it uses Levenshtein distance for computing the differences between sequences. But upon close inspection, I find that it actually uses the SequenceMatcher
function from the difflib
library. And this function, as per documentation uses the Ratcliff/Obershelp pattern-matching algorithm.
As per the definitions, Levenshtein distance is the minimum number of edits needed to transform one string into the other, and Ratcliff/Obershelp pattern-matching algorithm computes the doubled number of matching characters divided by the total number of characters in the two strings. A close related post comparing both.
And when I run an example, I get the same result for SequenceMatcher and ratio function in fuzzywuzzy.
from difflib import SequenceMatcher
from fuzzywuzzy import fuzz
s = SequenceMatcher(None, "abcd", "bcde")
s.ratio()
# 0.75
fuzz.ratio("abcd", "bcde")
# 75
If I compute the Levenshtein distance manually between the two strings, I guess it will be just 2. In this case, how does it become that it uses Levenshtein distance as the contributors write in the documentation?