I've been using this VBA solution by smirkingman from another similar question for calculating Levenshtein distance between strings. I have a need to translate this to an M code function in Excel Power Query, but don't have the know-how to do so.
Hoping someone can help me out. The 3 basic transformations between strings used in Levenshtein distance are below. Each counts as 1 step. More steps = greater distance between strings.
- Insertion
- Deletion
- Substitution
I thought I could "cheat" and not use a For loop-type structure as shown in the VBA example, but the test results below show that I need a more robust solution.
let
result = (s1 as text, s2 as text) as number =>
List.Max({Text.Length(s1),Text.Length(s2)}) - List.Count(List.Intersect({Text.ToList(s1), Text.ToList(s2)}))
in
result
Test Results
s1 | s2 | result | explanation |
---|---|---|---|
pale | pole | 1 | substitution |
dole | sale | 2 | substitution (x2) |
pool | spool | 1 | insertion |
two | one | 2 (incorrect) | substitution and/or insert/delete (3 steps min) EXPECTED: 3 |