0

Possible Duplicate:
Similarity String Comparison in Java

Hi all

I am trying to find partial matching between two Strings using Java, of course, there are lot of questions and answers on stackoverflow, however, non of these fulfill my requirment. I have two strings (sentences), for example, "strong java programming" and "Strong programing skill". Now I want to measure the degree of similarity between these two sentences like 25% not just partial matching = ture or false.

thanks

Community
  • 1
  • 1
solidfox
  • 581
  • 1
  • 6
  • 20

3 Answers3

6

You can use string distance determination algorithms like Levenshtein distance or Jaro-Winkler.

MRalwasser
  • 15,605
  • 15
  • 101
  • 147
3

Just use the String API and your own algorithms. Something like this:

public static double similarity(String a, String b) {
  double count = 0;
  String[] words = a.split();
  for(String word : words) {
    if(b.indexOf(word) != -1) {
      count++;
    }
  }
  return count / words.length;
}

The catch is that that's not quite right--you want to do a better job of looking at the words in B. I just wanted to give you a general idea of what methods and structure you might want to have. You also want to sanitize your input--make it all lower case, remove punctuation, who knows.

tsm
  • 3,598
  • 2
  • 21
  • 35
  • 2
    Do you mean `words.length`, not `a.length()`? And in either case, you need to check for divide-by-zero. Otherwise, you beat me to it (+1) ;-) – DNA Jul 02 '12 at 21:52
  • Good catch (I was even thinking arrays (I did have `length` instead of `length()`)...just flubbed it). – tsm Jul 03 '12 at 01:48
1

You can take a look at this library : SimMetrics .

SimMetrics is a Similarity Metric Library, e.g. from edit distance's (Levenshtein, Gotoh, Jaro etc) to other metrics, (e.g Soundex, Chapman).

aleroot
  • 71,077
  • 30
  • 176
  • 213
  • You can apply these metrics algorithms in this field(to compare similarity of strings), one of the common application is data mining ... – aleroot Jul 02 '12 at 21:55