I have a database of over a million contacts and need to return the best matches for a) user queries and b) batch jobs that run periodically. Not much debate that people name matching is complex and I am considering different routes:
- Roll our own (give us something basic to get us out of the blocks). Lots of good threads on this topic, such as How to calculate score for Metaphone/Soundex name searching in .net
- Leverage Azure Search / Cognitive Skills: Our platform is already built in Azure and using Azure Search would potentially be less work that (1) and a smaller jump than (3)
- Look to 3rd parties outside of Azure that specialise in the space of people name matching (NetOwl / Basistech / etc.).
Given we are scoped to solving the name matching for western style people names, can someone give me the pros and cons of using Azure Search to solve this? Here are some of classes of issues I hope we can address:
- Phonetic similarity: Jesus <=> Heyzeus
- Transliteration spelling differences: Abdul Rasheed <=> Abd al-Rashid
- Alternate names: William <=> Will <=> Bill <=> Billy
- Missing spaces or hyphens: MaryEllen <=> Mary Ellen <=> Mary-Ellen
- Truncated name components: McDonalds <=> McDonald <=> McD
- Optional name tokens: Joaquín Archivaldo Guzmán Loera <=> Joaquín Guzmán
- Name order variations: Park Sol Mi <=> Sol Mi Park
- Initials: J. E. Smith <=> James Earl Smith
Thanks in advance for any guidance and help. Simon.