-1

I have 2 columns that I am trying to see if there is a partial match between two strings. Column A has string: 0C000702AA-G and Column B has string S0C000702AB-DI. I did try:

CASE WHEN ColumnA LIKE '%' + ColumnB + '%' THEN '1' ELSE '0' END AS 'Match'

but it returns a 0. Is there a better way to see if there is almost a match?

Column A = 0C000702AA-G and Column B = S0C000702AB-DI. As you can see Column B is almost the same as A, B has prefix of 'S' and ends with 'AB-DI'. The result should return 1 because the part in the middle '0C000702AA' is the same both sides.

I just tested:

CASE WHEN '%' + ColumnA + '%' LIKE '%' + ColumnB + '%' THEN '1' ELSE '0' END AS 'Match'

Still returns 0

Ruan du Preez
  • 91
  • 1
  • 3
  • 12
  • 1
    That logic is never going to work, because ColumnA is not fully contained within ColumnB. In order to come up with a solution you need to be more exact about how much of the string in ColumnA needs to match part of ColumnB. – Dale K Jul 28 '21 at 06:26
  • do you mean I should maybe try substring match? I think would work but the problem is that not all the codes are the same length. – Ruan du Preez Jul 28 '21 at 06:30
  • 1
    I don't know what you should try until you can be more specific about your matching requirements. What exactly qualifies as a "partial match"? – Dale K Jul 28 '21 at 06:30
  • @RuanduPreez, Gordon is right. as my answer does not directly answer your question, please un-accept my answer, so somebody might provide a better solution to help you. – Venkataraman R Jul 28 '21 at 11:30

1 Answers1

2

You can utilize DIFFERENCE function, which compares the SOUNDEX values of the two strings. If difference is 0, then no similarity. If the difference is 4, they are very similar. Read more about SOUNDEX and DIFFERENCE

CAVEAT: The comparison is based on how the strings sound. So, it will not be very suitable for your needs, as you have got a identifier kind of thing, including digits.

DECLARE @table table(columnA CHAR(100), ColumnB CHAR(100))

INSERT INTO @table values 
('0C000702AA-G','S0C000702AB-DI')

SELECT SOUNDEX(ColumnA) as columnASoundex, SOUNDEX(columnB) as ColumnBSoundex, 
DIFFERENCE(ColumnA,ColumnB) as Similarity from @table
columnASoundex ColumnBSoundex Similarity
0000 S000 3

But, if you want to go for even detailed comparison, you can use a CLR stored procedure leveraging C# fuzzy matching libraries like fuzzystring. Also refer to SO post fuzzy matching in C#

UPDATE As OP confirmed, the above approach works only in some cases. So, OP has to figure out a better approach, which would suit all his needs.

Venkataraman R
  • 12,181
  • 2
  • 31
  • 58
  • Thanks for the answer. I tried it and some works well and others don't. There is places in my project where I can definitely use this function. Thanks – Ruan du Preez Jul 28 '21 at 06:53
  • @RuanduPreez, in that case, you have go for detailed fuzzy matching, using CLR stored procedure to suit your needs using C# fuzzy matching. – Venkataraman R Jul 28 '21 at 06:54
  • @RuanduPreez, sure. If this answers your question, please mark it as answer, it will be helpful for others in future. – Venkataraman R Jul 28 '21 at 06:58
  • @GordonLinoff, sorry. I thought that OP is going to use CLR approach using C# fuzzy logic to suit his needs and has accepted the answer. I agree with you. I will send message to OP to revert his decision. – Venkataraman R Jul 28 '21 at 11:29