0

After searching in Google and StackOverflow I couldn't find any resource talking about comparison of strings proximity in Java, I only find results about différence between == and equals...

Do one of you know any library allowing to compare the "proximity" between two strings and giving a percentage of proximity ?

Example : car and bar are very close, chicken and dog are very different

The idea is to be able to compare for example a city written by a user with the cities I have in my database to avoid duplicate data. For example if the user writes "NewYork", i could tell him "Do you mean "New-York" ?

Thanks a lot :)

c4k
  • 4,270
  • 4
  • 40
  • 65
  • Google for "nlp word similarity" – Ajay George Feb 14 '13 at 15:16
  • Thanks for the answer, but NLP is not for finding synonyms or close words analyzing the sense of the word ? Description of Wordnet : "WordNet® is a large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets)" Actually, I only want to compare close strings by characters and not by sense. Any idea ? – c4k Feb 14 '13 at 15:40
  • I used the Levenshtein distance algorithm for people who find this topic. It's not the best I think but it fits my needs. It is available in StringUtils. – c4k Feb 14 '13 at 17:10
  • http://stackoverflow.com/q/307291/628943 , http://stackoverflow.com/q/41424/628943 might be of interest here – Ajay George Feb 15 '13 at 09:25

1 Answers1

0

I've used SecondString MongeElkan algorithm, you could look at Lucene's algorithms as well.

SecondString Link

brightintro
  • 1,016
  • 1
  • 11
  • 16
haz
  • 1