I need to do a fuzzy name comparison for two names and see if they match. Think of it like you've written a check for "Frank A. Anderson" and when the check is deposited it changed to "Frank Smith". The goal is to see how close the two names match.
I discovered the function Difference()
in SQL Server and it seems like that does the job by comparing two strings and return how similar they are. However, I am having a problem understanding why the below query returns the same result:
SELECT
DIFFERENCE ('Frank A. Anderson', 'Frank Anderson') AS Good,
DIFFERENCE ('Frank A. Anderson', 'Frank Smith') AS Bad
Both Good and Bad returns "4", which I understand is a match. For the first one, sure, missing a middle name "A" is OK. But for the second one, not only it's missing the middle name A, it has a completely different last name. How come it also returns a 4? How can these two comparison returns the same result?
If Difference()
is not the right method I need for my purpose, what other suggestions you have? I hate to do a straight string equal as a comma/dot/space would ruin the comparison.
Any help would be appreciated.