I have the following problem. I want to identify strings in java that have a similar meaning. I tried to calculate similarities between strings with Stringmetrics. This works as expected but I need something more convenient.
For example when I have the following 2 strings (1 word):
String s1 = "apple";
String s2 = "appel";
Then those 2 strings are very similar. When I use the cosine similarity then i get the following result:
double score = cosine.compare(s1, s2); // 0.0
But when I use damerau-levenshtein similarity I get the following result:
double score = damerauLevenshtein.compare(s1, s2); // 0.8
The next problem is that there are a lot of synonyms for words. With Stringmetrics these synonyms are not considered.
For example these 2 strings should be considered the same:
String s3 = "purchase 10 bottles of water";
String s4 = "buy 10 waterbottles";
I hope you guys can help me.