I know the implementation of the edit distance algorithm. By dynamic programming, we first fill the first column and first row and then the entries immediately right and below of the filled entries by comparing three paths from the left, above, and left above. While for the Ratcliff/Obershelp algorithm, we first extract the longest common substring out from the two strings, then we do recursive operations for the left side two sub-strings and right side two sub-strings until no characters are left.
Both of them can be utilized to calculate the similarity between two strings and transform one string into another using four operations: delete, replace, copy, and insert.
But I wonder when to use which between SequenceMatcher in edit distance and that in difflib?
Here is what I found on the internet that makes me think that this question would also benefit others:
- In the documentation of edit distance it reads that
Similar to the difflib SequenceMatcher, but uses Levenshtein/edit distance.
- In this answer to a question on calculating edit distance, an answer on Ratcliff/Obershelp algorithm was provided.
- There are only a few resources about the Ratcliff/Obershelp algorithm, let alone its comparison to edit distance that I thought is the most well known string alignment algorithm.
So far as I know, I have the following ideas:
I find that edit distance and the Ratcliff/Obershelp algorithm can both be used for spell checking. But when to use which?
I thought the edit distance is employed to find the minimal edit sequence while the Ratcliff/Obershelp algorithm yields matches that "look right" to people. However, 'look right' seems too vague a term, especially in real world applications. What's more, when is the minimum edit sequence a must/preferred?
Any suggestions would be highly appreciated, and thanks in advance.