I'm trying to normalize a table based on addresses and account name. The issue is that addresses may be truncated, or abbreviated, or have errors. I did a group by address for the account, which got me closer, but I have thousands of records with dups.
I want the addresses to be grouped and to have a unique account name for each group of addresses. I'm going to take the duplicates and add them has contacts for the account using the account as my fk.
Here's what I have so far, but it isn't nearly close enough. Someone has had to of done this before. I very much appreciate any tips.
SELECT *,
--SELECT DISTINCT k.Placeholder,f.Address_1,
CASE
WHEN k.Placeholder IS NULL
THEN Clinic_Name
ELSE k.Placeholder
END AS AccountName
FROM [FL_Data].[dbo].[26K] f
LEFT JOIN
( SELECT fl.Address_1, 'Placeholder-' + CAST(MIN(fl.id) as varchar) as Placeholder
FROM [FL_Data].[dbo].[26K] fl
GROUP BY fl.Address_1
having count(*) > 1
) k
ON f.Address_1 = k.Address_1
Here are some examples of the addresses:
3011 NW 63rd St
3011 NW 63rd Street
3013 Winghaven
3013 WINGHAVEN BLVD
301 Northlake Ave. Ste-101
301 Northlake Avenue Ste.101