0

I have written the algorithm below to compute the Levenshtein distance, and it seems to return the correct results based on my tests. The time complexity is O(n+m), and the space is O(1).

All the existing algorithms I've seen only for this have space complexity O(n*m), as they create a matrix. Is there something wrong with my algorithm?

public static int ComputeLevenshteinDistance(string word1, string word2)
{
    var index1 = 0;
    var index2 = 0;
    var numDeletions = 0;
    var numInsertions = 0;
    var numSubs = 0;
    while (index1 < word1.Length || index2 < word2.Length)
    {
        if (index1 == word1.Length)
        {
            // Insert word2[index2]
            numInsertions++;
            index2++;
        }
        else if (index2 == word2.Length)
        {
            // Delete word1[index1]
            numDeletions++;
            index1++;
        }
        else if (word1[index1] == word2[index2])
        {
            // No change as word1[index1] == word2[index2]
            index1++;
            index2++;
        }
        else if (index1 < word1.Length - 1 && word1[index1 + 1] == word2[index2])
        {
            // Delete word1[index1]
            numDeletions++;
            index1++;
        }
        else if (index2 < word2.Length - 1 && word1[index1] == word2[index2 + 1])
        {
            // Insert word2[index2]
            numInsertions++;
            index2++;
        }
        else
        {
            // Substitute word1[index1] for word2[index2]
            numSubs++;
            index1++;
            index2++;
        }
    }

    return numDeletions + numInsertions + numSubs;
}
dwally89
  • 51
  • 2
  • 7

1 Answers1

0

Was a comment, but I feel it is probably suitable as an answer:

Short answer is "no", if you want the true shortest distance for any given inputs.

The reason your code appears more efficient (and the reason that other implementations create a matrix instead of doing what you're doing) is that your stepwise implementation ignores a lot of potential solutions.

Examples @BenVoigt gave illustrate this, another perhaps clearer illustration is ("aaaardvark", "aardvark") returns 8, should be 2: it's getting tripped up because it's matching the first a and thinking it can move on, when in fact a more optimal solution would be to consider the first two characters insertions.

pcdev
  • 2,852
  • 2
  • 23
  • 39
  • Interesting side note that I haven't investigated: `("Aardvark", "aardvark")` returns 2 instead of 1 (or 0). Perhaps one thing you might improve if you decide to go with this routine anyway is to standardise the case before processing (or provide a case sensitive switch as a parameter) – pcdev Nov 28 '18 at 05:01