0

I'm trying to write a query to look-up rows which contain one or more of comma-separated values. Please see scenario below:

Table: Events, target row to lookup is called "courses" The data in the courses column is a comma separated string of course codes.

  • ABC123,XYZ345,ABC987
  • ABC123
  • ABC123,ABC987
  • XYZ345,ABC123

Now what I want to do is to do a Select * from events where courses in ("ABC123","XYZ345"); However this returns no results.

What I want is that the where lookup to get me all rows that CONTAINS ABC123 OR XYZ345.

Could someone please help me do this?

ISDr
  • 209
  • 2
  • 7
  • 2
    You should not store data like this. Delimited data is a very bad practice. See https://stackoverflow.com/questions/3653462/is-storing-a-delimited-list-in-a-database-column-really-that-bad. Can data be restructured? – user3783243 Oct 18 '22 at 11:33
  • Does this answer your question? [mysql WHERE SET datatype contains item](https://stackoverflow.com/questions/5346274/mysql-where-set-datatype-contains-item) – Nico Haase Oct 18 '22 at 11:54
  • I'd prefer to convert CSVs to JSON arrays and to use JSON_OVERLAPS(). But the best clolution is to normalize the data and do not use CSV. – Akina Oct 18 '22 at 12:13
  • *Now what I want to do is to do a Select * from events where courses in ("ABC123","XYZ345");* Do you need in "at least one is present" or "each of is present"? – Akina Oct 18 '22 at 12:15

2 Answers2

0

You could use like, the % are wildcards.

select *
from events
where courses like "%ABC123%" or courses like "%XYZ345%"

There are other approaches: https://dev.mysql.com/doc/refman/8.0/en/pattern-matching.html

It might also be worth normalising the data so that each value is stored in its own row in another table rather than in a comma separated string, as using like will start to hurt performance as the table grows.

@user3783243 mentions loose matching which is a good point and just one reason why storing data like this isn't the best approach.

One way around it would be to tweak the query above to something like:

select *
from events
where courses like "%ABC123,%" or courses like "%XYZ345,%"

but this poses another problem, are the comma separated values always split by a single comma (ABC123,XYZ345) or is there any whitespace (ABC123, XYZ345).

Another problem pointed out by @GarethD is that the previous approaches won't match the last value of the comma separated string (unless it does have a trailing comma). One way I can think of is to do something like this, but it starts making the query a bit clunky, and also assumes all values are 6 characters in length, at this point, it might be worth using a regular expression.

select *
from events
where courses like "%ABC123,%" or courses like "%XYZ345,%"
or right(courses, 6) = 'ABC123' or right(courses, 6) = 'XYZ345'

If all values are indeed six characters then it might be worth trying other mysql functions such as locate, substring and regexp to try and simplify the query.

martincarlin87
  • 10,848
  • 24
  • 98
  • 145
  • What about loose matching? This doesn't have any CSV checks applied. e.g. `ABC1234,` would be matched as the same value as `ABC123,` with this answer. – user3783243 Oct 18 '22 at 11:32
  • good point, added an update. – martincarlin87 Oct 18 '22 at 11:38
  • Your solution also poses the problem that that it won't match the last item in the set i.e. `XYZ345,` won't be found in `ABC123,XYZ456` because there is no trailing comma. The *usual* solution for this (aside actually fixing the data and storing it properly) is adding a comma at the start and at the end of the text being searched, and doing the same to each of the search terms, e.g. `CONCAT(',', Courses, ',') LIKE '%,ABC123,%'` – GarethD Oct 18 '22 at 11:41
  • oh yeah, it's a bit messy with edge cases. Who knows, maybe the data is saved in a fashion where the last item does have a trailing comma, but between the answer and the comments, hopefully they get a spark on how to apply it to their situation, I can think of another way to handle the last item that I can add to my answer. – martincarlin87 Oct 18 '22 at 11:44
  • 1
    @GarethD I think solution should be https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_find-in-set but reallly solution is to fix the data set. With multiple values this function will have to be used numerous times, dropping efficiency. (encapsulated CSVs, spaced CSVs, etc. all will present issues) – user3783243 Oct 18 '22 at 11:44
0

Just use FIND_IN_SET()

SELECT columns FROM events WHERE FIND_IN_SET("ABC123", courses) OR FIND_IN_SET("XYZ345", courses)

Better of course would be a normalized table, since you can't use indexes with the current design.

Georg Richter
  • 5,970
  • 2
  • 9
  • 15