I asked a similar question [Querying database to find potential duplicates based on multiple columns and it was kindly answered by master of SQL Gordon Linoff.
All in all I have about 100k client records, only to find out now that there's potentially a lot of duplicate records - in most cases, first and last name are misspelled.
Here's the query that Gordon wrote for me...
select t
from (select t.*,
count(*) over (partition by soundex(fname), soundex(lname), dob) as cnt
from t
) t
where cnt > 1
order by dob, fname, lname;
And it works great - however it only seems to pull up very close matches. I was trying to update it to only try to match up the first three letters, but it doesn't seem to be doing a good job, it gives me a lot more records with the same DOB, however, the first/last name values are not only the first 3 letters, but a mix match basically.
So here's what I tried to do in the count(*) line
count(*) over (partition by substring(soundex(fname),0,3), substring(soundex(lname),0,3), dob) as cnt
but like I said it doesn't pull up exact 3 character matches, so I get first names such as
Ana
Annette
Is there any way that I can update it to be able to dig down a bit deeper in finding my duplicates but only use the first 3 characters of first and lastname, while still using Soundex? Or potentially not use Soundex?