I have a problem like this at work:
Column Code
has the value like, 1000, 1200, A1000, B1200, AAA, BBB, etc. Currently it is separated by spaces, sometimes more than one due to poor data input. And I am trying to check if a record contain a code that I am interested in.
Interested_Code
: 1000 or A1000 or 444 or 555 or A555 etc.
I know a simple solution from this answer:
A.CODE LIKE CAT('% ', T3.Interested_Code, ' %')
I have appended a leading and trailing space to A.CODE
to ensure a "full" exact match are returned. Because if I simply do
A.CODE LIKE CAT('%', T3.Interested_Code, '%') or
A.CODE CONTAINS T3.Interested_Code
I will get a false positive for code = 1000
at a row contained code = A1000
, this matches part of the code, but not necessary a correct result.
My code works above, but it is doing too many test and really slow. Is there a faster or smarter way in PROC SQL? The main table is about 100k rows, and each row has around 10-20 codes. The interested code is about 8k values. Thanks.