1

I want to compute the edit distance between two strings, using 4 operations: character insertion, deletion, replacement and swapping. Each operation is associated with a cost. For instance, if the first three costs are equal to 10 and the last is equal to 1, we have:

distance(abc,bca) = 2 (swap a and b then swap a and c)

distance(abc,bae) = 11 (swap b and a then replace c by e)

distance(abcd,bdca) = 4

I read the Wikipedia article about Damerau-Levenshtein distance, but the algorithm they give works only if 2 × swapping cost ≥ insert cost + delete cost.

lovasoa
  • 6,419
  • 1
  • 35
  • 45
  • i don't think your function is well defined ... if weighted-levenshtein is defined as the minimum-cost sequence of edits to transform the strings, then for your example 2, it should be cost 4 (i.e. 2 deletes + 2 inserts); more generally the significance of the inequality "2 × swapping cost ≥ insert cost + delete cost" is that if it's not satisfied you will always just delete and insert rather than swapping to get a lower score. does this make sense? – maxymoo Nov 15 '17 at 23:51
  • Why wouldn't it make sense to swap several times rather than just inserting and deleting ? I do not see the problem in the definition. In my case, I want to favor having the same elements, even if they are far one to another, over having to insert completely new elements. – lovasoa Nov 16 '17 at 09:30
  • ah sorry misread your question, i think this is a (NP) hard problem, see e.g. https://stackoverflow.com/questions/18292202/finding-the-minimum-number-of-swaps-to-convert-one-string-to-another-where-the – maxymoo Nov 17 '17 at 02:41

0 Answers0