I am trying to do fuzzy string matching to get as more matches as I can. First I execute the Levenshtein Distance Algorithm (http://www.kodyaz.com/articles/fuzzy-string-matching-using-levenshtein-distance-sql-server.aspx) and store it as "distance" in my dbo.
My first table (t1) looks like this:
Name | Synonym
A | A1
A | A2
A | A3
B | B1
B | B2
My second table (t2) looks like this: The ID field may look like Name and Synonym very much
ID | Description
A | XXX
B | YYY
My goal is to make left joins either on the Name or its Synonyms when the distance between 2 strings from each table (t1 and t2) are smaller than 2.
Here is my current work:
SELECT *
FROM (
SELECT t2.ID, ROW_NUMBER() over (partition by ID order by ID) as rn
FROM table1 as t1
LEFT JOIN table2 as t2
ON (upper(trim(t1.Name)) = upper(trim(t2.ID)) OR upper(trim(t1.Synonym)) = upper(trim(t2.ID)))
WHERE (dbo.distance(t1.Name,t2.ID)<=2 OR dbo.distance(t1.Synonym,t2.ID)<=2)
) temp
WHERE rn=1
Ideally, as long as their distance is smaller than 2, we will still doing the join.
It should get more matches by adding that condition, however it doesn't.
Am I missing anything here?
I was wondering if my problem is coming from this:
My intention is to see if the conditions meet, if so then just do the join. But my code here probably tells SQL to "join first", and the filter it afterwards.
Is there a way to let it see if the condition qualifies and then do the join "after"?