6

I'm sitting here and programmnging some algorithms for my main program in Java (well the first one so far). I programmed the levenshtein algorithm just fine thanks to wiki being so nice with the pseudocode to newbeginners plus a nice tutorial :D

I then decided to upgrade to Damerau and added the extra lines but then I read that it's not DL algo but OptimalStringAlignmentDistance instead. I tried reading the actionscript code to understand what more I needed to add to make it to DL but got confused instead. I've been to different places with code looking alike to Java but they all use the wrong pseudocode too.

After spending half the day I gave up and decided to ask here. Is there anyone who can help me with upgrading this code to Damerau-Levenshtein in Java?

    public class LevensteinDistance {
        private static int Minimum(int a, int b, int c) {
            return Math.min(Math.min(a, b), c);
        }

        private static int Minimum (int a, int b) {
            return Math.min(a, b);
        }

        public static int computeLevensteinDistance(String s, String t){
            int d[][];
            int n; // length of s
            int m; // length of t
            int i; // iterates through s
            int j; // iterates through t
            char s_i; // ith character of s
            char t_j; // jth character of t
            int cost; // cost

            n = s.length ();
            m = t.length ();
            if (n == 0) {
                return m;
            }
            if (m == 0) {
                return n;
            }
            d = new int[n+1][m+1];

            for (i = 0; i <= n; i++) {
                d[i][0] = i;
            }

            for (j = 0; j <= m; j++) {
                d[0][j] = j;
            }

            for(i = 1; i <= n; i++) {
                s_i = s.charAt (i - 1);
                for(j = 1; j <= m; j++) {
                    t_j = t.charAt (j - 1);

                    if(s_i == t_j){
                        cost = 0;
                    }else{
                        cost = 1;
                    }
                    d[i][j] = Minimum(d[i-1][j]+1, d[i][j-1]+1, d[i-1][j-1] + cost);

                    if(i > 1 && j > 1 && s_i == t_j-1 && s_i-1 == t_j){ 
                        d[i][j] = Minimum(d[i][j], d[i-2][j-2] + cost);
                    }
                }
            }
        return d[n][m];
    }

    //    public static void main(String[] args0){
    //      String a = "I decided it was best to ask the forum if I was doing it right";
    //      String b = "I thought I should ask the forum if I was doing it right";
    //      System.out.println(computeLevensteinDistance(a, b));
    //    }
}

Here is the Wikipedia page for the Damerau–Levenshtein distance algorithm

MarcoS
  • 13,386
  • 7
  • 42
  • 63
N00programmer
  • 1,111
  • 4
  • 13
  • 17

2 Answers2

11

Your problem is in referencing the previous characters from the string in your conditional. In your original code you have:

if(i > 1 && j > 1 && s_i == t_j-1 && s_i-1 == t_j){ 
  d[i][j] = Minimum(d[i][j], d[i-2][j-2] + cost);
}

The problem is the values t_j-1 and s_i-1. These say the ith character of s and t minus 1, where the algorithm says you want the (ith minus 1) characters. For example if string s is "AFW" and i is 1 then

s_i - 1 = E; //the character value (s[1]='F') minus 1 = 'E'
s.charAt(i-1) = A; //i-1 = 0, s[0] = 'A'

so your conditional should read:

if(i > 1 && j > 1 && s_i == t.charAt(j-1) && s.charAt(i-1) == t_j) { 
  d[i][j] = Minimum(d[i][j], d[i-2][j-2] + cost);
}

EDIT: Unforutnately I do not understand the algorithm from reading the code, however here is a translation of the ActionScript example from the wikipedia page in Java, which gives an output that matches your example:

public static int damerauLevenshteinDistance(
      String a, String b, int alphabetLength) {
    final int INFINITY = a.length() + b.length();
    int[][] H = new int[a.length()+2][b.length()+2];  
    H[0][0] = INFINITY;
    for(int i = 0; i<=a.length(); i++) {
      H[i+1][1] = i;
      H[i+1][0] = INFINITY;
    }
    for(int j = 0; j<=b.length(); j++) {
      H[1][j+1] = j;
      H[0][j+1] = INFINITY;
    }      
    int[] DA = new int[alphabetLength];
    Arrays.fill(DA, 0);
    for(int i = 1; i<=a.length(); i++) {
      int DB = 0;
      for(int j = 1; j<=b.length(); j++) {
        int i1 = DA[b.charAt(j-1)];
        int j1 = DB;
        int d = ((a.charAt(i-1)==b.charAt(j-1))?0:1);
        if(d==0) DB = j;
        H[i+1][j+1] =
          min(H[i][j]+d,
              H[i+1][j] + 1,
              H[i][j+1]+1, 
              H[i1][j1] + (i-i1-1) + 1 + (j-j1-1));
      }
      DA[a.charAt(i-1)] = i;
    }
    return H[a.length()+1][b.length()+1];
  }

  private static int min(int ... nums) {
    int min = Integer.MAX_VALUE;
    for (int num : nums) {
      min = Math.min(min, num);
    }
    return min;
  }
M. Jessup
  • 8,153
  • 1
  • 29
  • 29
  • Ahh thanks for pointing out that stupid mistake. I didn't mean that. You see now(after you corrected it) the Java code is still not finding Levenshtein Damerau Distance, instead it's finding OptimalStringAlignmentDistance, according to the wiki page. The example is given with that **LD(CA,ABC)** would give a result on **2** because **CA -> AC -> ABC** while **OSA(CA,ABC)** will give one on **3** because of **CA -> A -> AB -> ABC** which this code does. – N00programmer May 18 '11 at 09:27
  • Didn't realize the Levenshtein Damerau was a different algorithm, I have edited the answer to include a translation of the actionscript from wikipedia. – M. Jessup May 18 '11 at 13:54
  • Thanks a lot :D I can only see that it reminds a lot about OSA but dunno the alphabetLenght (which got nothing to do with the letters in the alphabet...I set it to 100k). It seems to be there for the real transpositions. Still it works and perfectly. Again thanks a lot. I'll go through it while programming it and see if I can get something out of it before I move to the next algorithm to implement. – N00programmer May 18 '11 at 20:00
  • 2
    Following is a link to an explanation of the full Damerau-Levenshtein edit distance algorithm as well as a Java implementation (hope this is helpful to someone): http://software-and-algorithms.blogspot.com/2012/09/damerau-levenshtein-edit-distance.html – Kevin L. Stern Sep 16 '12 at 20:34
0

I think a SparseArray can be used for DA, that way it is not necessary to know the exact size of the alphabet.

SparseArray<Integer> DA = new SparseArray<Integer>();
  ...
    int i1 = DA.get(b.charAt(j - 1), 0);
trans
  • 1,411
  • 1
  • 13
  • 13