I am facing a problem while finding string similarity.
Scenario: The string which consisits of following fields first_name, middle_name and last_name
What I have do is to find string similarity between A and B (both have same fields) but making sure all possibilities are considered.
Case 1: say string A has first name is : Rahul middle name is: Kumar last name is : " "
And string B has first name : Kumar middle name: " " last name: Rahul
By seeing we can say both names might be same. But Current Similarity algorithms are giving around 71% similarity.
Case 2:
Say, string B has first name : Rahul middle name: " " last name: K.
In this case similarity %age falls down, but the name might be same.
What should I do to similarity ,considering all possibile combinations of first,middle and last name and in an optimised fashion?
E.g Rahul Rakesh Kumar can be written as Rahul Kumar Rahul Rahul K. Kumar Rahul Kumar rakesh Rahul. Rahul R. Kumar etc.
I had tried using Jaccard, Cosine, Jaro-Wrinklr similarity algorithms but result are not satisafactory.
Note: I have to find dedupe based on Names that's why I have to consider all possible of Names.