First, fuzzy lookup is a little vague. There are a number of algorithms that are used for fuzzy matching including the Levenshtein Distance, Longest Common Subsequence, and some others referenced in the "See Also" section of this Wikipedia page about Approximate String Matching.
To rephrase what you are attempting to do. You are updating the UNDERWRITER_CODE
column in MainTbl
with the UNDERWRITER_CODE
that matches the most similar UNDERWRITER_NAME
in LKP
. Fuzzy algorithms can be used for measuring similarity. Note my post here. For the sample data you provided we can use Phil Factor's T-SQL Levenshtein functions and match based on the lowest Levenshtein value like so:
SELECT TOP (1) WITH TIES
UNDERWRITER_CODE_NULL = m.UNDERWRITER_CODE,
LKP_UN = m.UNDERWRITER_NAME, l.UNDERWRITER_NAME, l.UNDERWRITER_CODE,
MinLev = dbo.LEVENSHTEIN(m.UNDERWRITER_NAME, l.UNDERWRITER_NAME)
FROM dbo.MainTbl AS m
CROSS JOIN dbo.LKP AS l
WHERE m.UNDERWRITER_CODE IS NULL
ORDER BY ROW_NUMBER() OVER (PARTITION BY m.UNDERWRITER_NAME
ORDER BY dbo.LEVENSHTEIN(m.UNDERWRITER_NAME, l.UNDERWRITER_NAME))
Returns:
UNDERWRITER_CODE_NULL LKP_UN UNDERWRITER_NAME UNDERWRITER_CODE MinLev
--------------------- ------------------ ------------------ ---------------- -----------
NULL dylan.campbell Dylan Campbell 2 1
NULL dylanadmin Dylan Campbell 2 8
NULL dylanc Dylan Campbell 2 8
NULL scott.noffsinger Scott Noffsinger 1 1
We can use this logic to update UNDERWRITE_CODE
like so:
WITH FuzzyCompare AS
(
SELECT TOP (1) WITH TIES
UNDERWRITER_CODE_NULL = m.UNDERWRITER_CODE,
LKP_UN = m.UNDERWRITER_NAME, l.UNDERWRITER_NAME, l.UNDERWRITER_CODE,
MinLev = dbo.LEVENSHTEIN(m.UNDERWRITER_NAME, l.UNDERWRITER_NAME)
FROM dbo.MainTbl AS m
CROSS JOIN dbo.LKP AS l
WHERE m.UNDERWRITER_CODE IS NULL
ORDER BY ROW_NUMBER() OVER (PARTITION BY m.UNDERWRITER_NAME
ORDER BY dbo.LEVENSHTEIN(m.UNDERWRITER_NAME, l.UNDERWRITER_NAME))
)
UPDATE fc
SET fc.UNDERWRITER_CODE_NULL = fc.UNDERWRITER_CODE
FROM FuzzyCompare AS fc
JOIN dbo.MainTbl AS m ON fc.UNDERWRITER_NAME = m.UNDERWRITER_NAME;
After this update SELECT * FROM dbo.mainTbl
Returns:
UNDERWRITER_CODE UNDERWRITER_NAME
---------------- -------------------
2 dylan.campbell
2 dylanadmin
2 dylanc
2 Dylan Campbell
2 dylan.campbell
2 dylanadmin
1 scott.noffsinger
1 Scott Noffsinger
This should get you started; depending on the amount & kind of data you are dealing with, you will need to be very selective about what algorithms you use. Do your homework and test, test ,test!
Let me know if you have questions.