I have data coming from two different systems which I input into two different tables in SQL Server 2016. These tables are related with a Key.
Table1
Key Name
------------------------
1 Jim Parsons
2 Steven Woz
3 Billy Jean
4 Carol Henderson
5 Sara Hauder
Table2
Key Name
---------------------------
1 Jimmy Parsons
2 Steven Wozniak
3 Bill Jean
4 Andy Smith
5 Sara Hauder (a)
What I am hoping to develop is an additional column that will tell me if the Name
s match. However, if the names are similar, I want to consider it a match.
In order to accomplish this I was thinking I could do something like
select
*,
case when a.Name like b.Name 1 else 0
end as Flag
from
(Table1 a
join
Table2 b on a.Key = b.Key)
But obviously this wouldn't work because even the slightest difference can cause a like statement to be false.
I'm not very familiar with how to find the longest common substring, but I don't actually care what the string is. What I'm thinking is I need to determine the average number/percentage of characters shared by the two fields and then say: if the number of characters shared between a.Name and b.Name is above or equal to the average, then 1 else 0. Not really sure how to go about this however...
AS an example: Sara Hauder
should match with Sara Hauder (a)
, but Carol Henderson
should not match with Andy Smith
. Note that many Name
s will sometimes have some different tags at the end in parenthesis which can be filtered out if necessary. However, I am perfectly capable of adding this feature myself if you choose to ignore.