I have a database that sometimes stores duplicate rows, however the duplicate is not clear cut, e.g. the following two column values would be a duplicate:
G12345 & G1234 --> because they are very similar
(a string comparison shows that the characters match 83.3%).
I need some help writing an SQL
query that would retrieve values that are very similar to a string sent as part of the query, e.g. over 50% of characters matched.
Can someone help with this? I have a C# method as follows but not quite sure how to accomplish this in SQL:
static double StringCompare(string a, string b)
{
if (a == b) //Same string, no iteration needed.
return 100;
if ((a.Length == 0) || (b.Length == 0)) //One is empty, second is not
{
return 0;
}
var maxLen = a.Length > b.Length ? a.Length : b.Length;
var minLen = a.Length < b.Length ? a.Length : b.Length;
var sameCharAtIndex = 0;
for (var i = 0; i < minLen; i++) //Compare char by char
{
if (a[i] == b[i])
{
sameCharAtIndex++;
}
}
return sameCharAtIndex / maxLen * 100;
}
Thanks in advance.