1

Quick question on LIKE searches. If I have a column with a value 'Analyst,Trainer' and another column 'Workflow,Analyst,Tester,Trainer', I want to check if the values in the first column are in the second column. We can easily see that the answer is yes but they're not in order so we cant just do a like comparison.

Any ideas how to figure this one out? Probably write a function that passes in both values and does a split routine on both? Seems like there might be an easier way.

Thanks for the ears.

David

SQB
  • 3,926
  • 2
  • 28
  • 49
David Whitten
  • 573
  • 1
  • 4
  • 12
  • i'm not sure but you can look for [FIND_IN_SET](http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_find-in-set) – rcpayan May 24 '13 at 23:10
  • 1
    SQL is designed for row based data and can do massive joins easily. You might find it better to explode out the csv info into multiple rows first and just do a simple join. – Preet Sangha May 24 '13 at 23:11
  • This is Sql Server by the way – David Whitten May 24 '13 at 23:26
  • 1
    My idea is to normalize your database. If you don't know what that means, I've heard good things about the book, Database Design for Mere Mortals. – Dan Bracuk May 25 '13 at 01:32

3 Answers3

1

While this is probably better to do with scripting, it was interesting to try with SQL.

First you need a row identifier -- this uses ROW_NUMBER(). Next you can look into using CROSS APPLY to split your comma delimited list up. Then you can compare the lists accordingly.

Here is a working example:

WITH YOURTABLE_WITH_RN AS (
  SELECT ROW_NUMBER() OVER (ORDER BY col1) rn, 
    col1, 
    col2
  FROM YourTable
  ),
YOURTABLE_COL1 AS (
  SELECT rn,
    Split.a.value('.', 'VARCHAR(100)') AS col1 
  FROM  
  (
     SELECT ROW_NUMBER() OVER (ORDER BY col1) rn,
       CAST ('<M>' + REPLACE(col1, ',', '</M><M>') + '</M>' AS XML) AS col1
     FROM  YourTable
  ) AS A 
  CROSS APPLY col1.nodes ('/M') AS Split(a)
  ), 
YOURTABLE_COL2 AS (
  SELECT rn,
    Split.a.value('.', 'VARCHAR(100)') AS col2 
  FROM  
  (
     SELECT ROW_NUMBER() OVER (ORDER BY col1) rn,
       CAST ('<M>' + REPLACE(col2, ',', '</M><M>') + '</M>' AS XML) AS col2
     FROM  YourTable
  ) AS A 
  CROSS APPLY col2.nodes ('/M') AS Split(a)
  ) 
SELECT T.rn, 
  T.col1, 
  T.col2, 
  CASE WHEN T2.rn IS NULL THEN 'Y' ELSE 'N' END AllIncluded
FROM YOURTABLE_WITH_RN T
   LEFT JOIN (
    SELECT T.rn
    FROM YOURTABLE_COL1 T
        LEFT JOIN YOURTABLE_COL2 T2 ON T.rn = T2.rn AND T.col1 = T2.col2
    WHERE T2.rn IS NULL
   ) T2 ON T.rn = T2.rn

SQL Fiddle Demo

You wouldn't need the first CTE (Common Table Expression) or to use the ROW_NUMBER() if you have a unique identifier in your table already.

sgeddes
  • 62,311
  • 6
  • 61
  • 83
0

Yeah, I did venture in this way a bit then thought about the performance when dealing with a huge dataset. I actually did some chess programming a while ago and think that maybe converting those values like 'Analyst,Tester,Trainer' what have ya into bit values in the respective table so the value you see here could be 7 (1,2,4) and then maybe 'Analyst,Trainer' might be 5 (1,4) so if I wanted to test if 'Analyst,Trainer' was within 'Analyst,Tester,Trainer' I could test (7 & 5) = 5. This seems like it will work. This is how we can test if a chess piece is on an attacking square. Not sure why this didn't occur to me earlier. Thanks for the comments though.

David Whitten
  • 573
  • 1
  • 4
  • 12
0

You can split a column with 'Analyst,Trainer' value, using XQuery methods(nodes, value) and then compare with the 'Workflow,Analyst,Tester,Trainer' value, using the LIKE operator.

SELECT t.col1, t.col2, MIN(CASE WHEN t.col2 LIKE '%' + o.col1 + '%'
                                THEN 1 ELSE 0 END) AS IsMatch
FROM YourTable t 
  CROSS APPLY 
    (
     SELECT Split.a.value('.', 'nvarchar(100)')
     FROM (       
           SELECT CAST('<M>'+REPLACE(t.col1,',','</M><M>')+'</M>' AS xml) AS col1
           ) AS a
     CROSS APPLY col1.nodes('/M') AS Split(a)
     ) o(col1)
GROUP BY t.col1, t.col2

Demo on SQLFiddle

Also you can use option with dynamic management function sys.dm_fts_parser Before script execution you need check full-text component is installed:

SELECT FULLTEXTSERVICEPROPERTY ('IsFulltextInstalled')

0 = Full-text is not installed. 1 = Full-text is installed. NULL = Invalid input, or error.

If 0 = Full-text is not installed then this post is necessary to you How to install fulltext on sql server 2008?

SELECT t.col1, t.col2, MIN(CASE WHEN t.col2 LIKE '%' + o.col1 + '%'
                                THEN 1 ELSE 0 END) AS IsMatch
FROM YourTable t 
  CROSS APPLY (
               SELECT display_term
               FROM sys.dm_fts_parser('"'+ 'nn,' + t.col1 + '"', 1033, NULL, 0)  
               WHERE display_term NOT LIKE 'nn%'
               ) o(col1)
GROUP BY t.col1, t.col2

For avoiding sorting operation, use index:

CREATE INDEX x ON YourTable(col1, col2)

FYI, The best performance will be with the sys.dm_fts_parser

Community
  • 1
  • 1
Aleksandr Fedorenko
  • 16,594
  • 6
  • 37
  • 44