I have to normalize the Levenshtein distance between 0 to 1. I see different variations floating in SO.
I am thinking to adopt the following approach:
- if two strings, s1 and s2
- len = max(s1.length(), s2.length());
- normalized_distance = float(len - levenshteinDistance(s1, s2)) / float(len);
Then the highest score 1.0 means an exact match and 0.0 means no match.
But I see variations here: two whole texts similarity using levenshtein distance where 1- distance(a,b)/max(a.length, b.length)
Difference in normalization of Levenshtein (edit) distance?
Explanation of normalized edit distance formula
I am wondering is there a canonical code implementation in Java? I know org.apache.commons.text
only implements LevenshteinDistance and not normalized LevenshteinDistance.