6

I am particularly working on a function to allow the misspelled and aliases of person names. I have done some research & found there are quite a number of algorithms for String metric and phonetic libraries too.

I have tried some and of all those Jaro Winkler gives some good results as below.

compareStrings("elon musk","elon musk"))    --> 1.0 
compareStrings("elonmusk","elon musk"))     --> 0.98
compareStrings("elon mush","elon musk"))    --> 0.99
compareStrings("eln msuk","elon musk"))     --> 0.94
compareStrings("elon","elon musk"))         --> 0.89
compareStrings("musk","elon musk"))         --> 0.0  //This is bad, but can fix that.
compareStrings("mr elon musk","elon musk")) --> 0.81

The above is the implementation from Apache commons Library.I wanted to know if there is any better implementation which serves the purpose better. Any help is appreciated.

Edit: @newuserua_ext @Trasher Thanks, I appreciate for your time. I have gone through all StackExchange Q&A related to this. And posted this question focusing on person names.

Vamsidhar
  • 822
  • 11
  • 24
  • 6
    When you downvote, please mention the reason. I posted this because I needed help, I couldn't find anything better on Internet. – Vamsidhar Dec 09 '16 at 04:55
  • 1
    Check this out (overview section) : https://github.com/tdebatty/java-string-similarity. Good luck! – Rcordoval Dec 09 '16 at 04:56
  • @Thrasher Thank you for the link :) As I mentioned, my question is very particular to person names. – Vamsidhar Dec 09 '16 at 04:59
  • "The Jaro–Winkler distance metric is designed and best suited for short strings such as person names, and to detect typos." – Rcordoval Dec 09 '16 at 05:08
  • @Thrasher Thank god. Someone is finally understanding. Exactly, I am looking for a better algorithm for validation of "Person names". – Vamsidhar Dec 09 '16 at 05:10
  • I found a paper about find and match personal names: [Techniques and Practical Issues](https://pdfs.semanticscholar.org/654d/51abeb59861dde5f8097127a5b5a12147f9f.pdf) – Rcordoval Dec 09 '16 at 05:47
  • I found something similar, maybe it will help you;http://stackoverflow.com/questions/955110/similarity-string-comparison-in-java – newuserua_ext Dec 09 '16 at 06:47

2 Answers2

0

Consider Double Metaphone. We use it successfully to find "sounds-like" matches to names. You can find an implementation for Java in Apache Commons:

https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/language/DoubleMetaphone.html

Dan Armstrong
  • 469
  • 2
  • 5
0

One possibility is the Levenshtein distance, which measures the edit distance of the strings given specific permitted operations. It can be more or less efficiently evaluated using dynamic programming, but is not really suitable for determining phonetic similarity.

Codor
  • 17,447
  • 9
  • 29
  • 56