4

If I have a column called 'Categories' with say science,maths,english in the row comma-separated as shown, how would I match all rows with the category containing maths?

I've tried a simple LIKE but it is not quite accurate as there may be 'poo_science' which when searching for '%science%' would match both.

I've looked around StackOverflow and there are plenty of similar questions but all seem to want to return data as a comma separated list or something - not quite what I'm after.

I'd prefer not to use a stored procedure and cannot use full-text searching. I have a stored procedure I used which added another character ('$') around each value and then would search for '$value$'... is this too nasty? I'm after a little more simple method.

Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
Alex Guerin
  • 2,336
  • 10
  • 36
  • 53
  • 9
    This is why multi-valued attributes are **evil**. – Yuck Feb 23 '12 at 22:09
  • @Yuck I thought the increase in concurrency issues was why they are **evil**. – Conrad Frix Feb 23 '12 at 22:10
  • 5
    There is no way I would let this design stand. You need a child table for the data so you can effectively qwuery it. Never store a comma delimeted list. – HLGEM Feb 23 '12 at 22:12
  • They are evil because you then have to split the string using a function per line and then search each of these or match on exact patterns that sometimes fall apart with special characters. This is expensive, you don't get any of the benefits of indexing, no normalization, and it's, well, ******* annoying. Sorry, the characters in stars are not an expletive, I accidentally pasted my password in ;-) – dash Feb 23 '12 at 22:16
  • Why on earth would you "prefer not to use a stored procedure"? Please read http://stackoverflow.com/questions/22907/which-is-better-ad-hoc-queries-or-stored-procedures and http://stackoverflow.com/questions/2734007/when-is-it-better-to-write-ad-hoc-sql-vs-stored-procedures for pre-existing debates. – Aaron Bertrand Feb 23 '12 at 22:24
  • 2
    @DavidStratton Can you elaborate on how `LIKE` clauses are vulnerable to SQL Injection? All I can imaging is someone entering a `%` to cause a table scan... – Michael Fredrickson Feb 23 '12 at 22:25
  • @Michael Fredrickson and Yuck - Thank you to both of you. You're right. I was looking at it all wrong. Comments withdrawn, and thank you for making me think!!!! I was thinking of this: http://www.gremwell.com/sql_injection_in_like_clause where it's easy to get it wrong. – David Feb 23 '12 at 22:36

6 Answers6

11

Disclaimer: The commentators are right... CSVs in a single field are a horrible design, and should be re-done.

With that said, here's how you can work around your problem:

Pad Categories with leading and trailing ,, that way you can include them in your wildcard search:

WHERE (',' + Categories + ',') LIKE '%,science,%'
usr
  • 168,620
  • 35
  • 240
  • 369
Michael Fredrickson
  • 36,839
  • 5
  • 92
  • 109
  • I fixed a tiny syntax error. This solution will work and it will be faster than the XML solution. (Of course, it will still require a table scan or index scan). – usr Feb 23 '12 at 22:25
8

Use FIND_IN_SET(,)

SQL:

SELECT name FROM orders,company
WHERE orderID = 1 
AND 
FIND_IN_SET(companyID, attachedCompanyIDs)

or can check this link FIND_IN_SET() vs IN()

Community
  • 1
  • 1
Amar Banerjee
  • 4,992
  • 5
  • 34
  • 51
  • Whoa! Excellent solution for comma delimited columns! Takes care of all exceptional cases as well. Should have been one of the accepted answers! – jahackbeth Oct 06 '14 at 08:27
  • 2
    FIND_IN_SET is MySQL only. Great solution, but not on SQL Server. – snort Nov 11 '14 at 00:30
3

I propose a 4x WHERE that can match any of the possible cases: value alone, value at the start, middle or end of the csv:

WHERE Categories = 'science'     /* CSV containing only the one value */
OR Categories LIKE 'science,%'   /* value at start of CSV */
OR Categories LIKE '%,science,%' /* value somewhere in the middle */
OR Categories LIKE '%,science'   /* value at the end of CSV */

This way all 'science' rows should be selected but none of the 'poo_science' rows.

FrankKrumnow
  • 501
  • 5
  • 13
2

This question is visible on google and has many views, so I want to share my approach to this problem. I had to deal with such a poor design as comma-separated values stored as strings too. I came across this issue while tweaking a CMS's plugin responsible for tags.

Yeah, tags related to a site article were stored like this: "tag1,tag2,...,tagN". So, getting the exact match wasn't as trivial as it might have initially appeared: using simple LIKE, with articles tagged "ball" I also got ones tagged "football" and "ballroom". Not critical, but rather annoying.

FIND_IN_SET function seemed awesome at first but then it turned out that it doesn't use index and doesn't work properly if the first argument contains a comma character.

I had no desire to alter the plugin itself or deeper CMS core functionality which that plugin had been built upon.

Also it is worth noting that needed tag (substring) can be the first, the last element in the string or can be somewhere in the middle, so this piece of code WHERE (',' + Categories + ',') LIKE '%,science,%' doesn't cover all three cases.

Finally, I ended up with very simple solution. It worked for me like this:

... WHERE tags LIKE 'ball,%' OR tags LIKE '%,ball,%' OR tags LIKE '%,ball'

All theree cases covered; commas used as delimiters. Hope it helps others who came across similar pitfall.

PS. I am not a MySQL/DB expert at all and I would love to read about potential drawbacks of this approach especially on really huge tables (which wasn't my case, btw). I just shared the results of my small research and what I did to solve this problem with minimal efforts.

curveball
  • 4,320
  • 15
  • 39
  • 49
2

I've made some assumptions about your data layout. Try this - using SQL Server 2K8+ this should work:

DECLARE @SearchString NVarChar(100) = 'maths';

SELECT 1 SomeId, 'science,maths,english' Categories
INTO #TestTable;

WITH R AS (
  SELECT
    X.SomeId,
    C.value('@value', 'NVarChar(100)') SomeTagValue
  FROM (SELECT SomeId,
          CONVERT(XML, '<tag value = "' + REPLACE(Categories, ',', '" /><tag value = "') + '" />') XMLValue
        FROM #TestTable) X CROSS APPLY X.XMLValue.nodes('//tag') T(C)
)
SELECT *
FROM R
WHERE SomeTagValue = @SearchString;

DROP TABLE #TestTable;

It's definitely not going to be super-efficient or very scalable, but then working against denormalized data tends to inherently have those issues.

Yuck
  • 49,664
  • 13
  • 105
  • 135
1

use FIND_IN_SET() mysql function

Syntax

SELECT * FROM as a WHERE FIND_IN_SET(value to search in string,comma separated string);

Example

SELECT * FROM as a WHERE FIND_IN_SET(5,"1,2,3,4,5,6");

More Information Follow Below Link :

http://blog.sqlauthority.com/2014/03/21/mysql-search-for-values-within-a-comma-separated-values-find_in_set/

RiggsFolly
  • 93,638
  • 21
  • 103
  • 149
Sujal Patel
  • 592
  • 2
  • 5
  • 14