0

In my project the user input is translated to a string in the form of [x,x,x,x,x,x,x,x,x,x] where x is a number between 1 and 8 and stored to a library of this type of strings. Later I have to compare the user new input with every string in that library.

So I'm trying to find the similarity between two strings of the mention format. I tried the Levenhstein distance algorithm but it does not suit my needs. For the strings [1,8,7,6,5,4,3,2,0,0] and [1,7,6,5,4,3,2,0,0,0], Levenhstein finds distance of 7 while in my prospective the distance is only one. Levenhstein only edits each character but does not shift the characters.

Can someone suggest another spell-check or string-compare algorithm with the criteria I mentioned before?

Levenshtein algorithm I used:

public static int getLevenshteinDistance(String s, String t)
{
   if (s == null || t == null)
   {
      throw new IllegalArgumentException("Strings must not be null");
   }
   int d[][]; // matrix
   int n; // length of s
   int m; // length of t
   int i; // iterates through s
   int j; // iterates through t
   char s_i; // ith character of s
   char t_j; // jth character of t
   int cost; // cost

   // Step 1
   n = s.length();
   m = t.length();
   if (n == 0)
   {
      return m;
   }
   if (m == 0)
   {
      return n;
   }
   d = new int[n + 1][m + 1];

   // Step 2
   for (i = 0; i <= n; i++)
   {
      d[i][0] = i;
   }

   for (j = 0; j <= m; j++)
   {
      d[0][j] = j;
   }

   // Step 3
   for (i = 1; i <= n; i++)
   {
      s_i = s.charAt(i - 1);

      // Step 4
      for (j = 1; j <= m; j++)
      {
         t_j = t.charAt(j - 1);

         // Step 5
         if (s_i == t_j)
         {
            cost = 0;
         }
         else
         {
            cost = 1;
         }

         // Step 6
         d[i][j] = min(d[i - 1][j] + 1, d[i][j - 1] + 1, d[i - 1][j - 1] + cost);
      }
   }

   // Step 7
   return d[n][m];
}
Bernhard Barker
  • 54,589
  • 14
  • 104
  • 138
Anonymous
  • 4,470
  • 3
  • 36
  • 67
  • Levenhstein distance includes insertions and deletions, so it's supposed to work. – Bernhard Barker Dec 20 '13 at 19:59
  • pehaps i used an older version of the algorithm that does not support that. i'm updating the question with the code i used – Anonymous Dec 20 '13 at 21:19
  • [I get `2`, not `7`, when running your code](http://ideone.com/N8NjUc). And it's `2`, not `1`, because there are 3 0's at the end of the one but 2 at the end of the other - you'd have to explicitly cater for this if you wish to ignore the trailing 0's. – Bernhard Barker Dec 21 '13 at 07:55
  • For the algorithm suggestion part, see this question - [What algorithm gives suggestions in a spell checker?](http://stackoverflow.com/q/2294915) – Bernhard Barker Dec 21 '13 at 08:01
  • i got the right result when i converted [x,x,x,x,x,x,x,x,x,x] to "xxxxxxxxxx" – Anonymous Dec 21 '13 at 13:07

0 Answers0