1

I have a table where two columns have values separated by a comma, and another table of categories. group_1 can have anything up to 30 separated values and group_2 can have anything up to 5 separated values

myProducts

alias group_1 group_2
product_a 1,2,3,[...] uk, us [...]
product_b 2,4,[...] uk, us, [...]
product_c 1,4,[...] spain, germany, [...]

myCategories

id category
1 category_a
2 category_b
3 category_c
4 category_d

Is it possible with MySQL to split both the comma separated fields into multiple rows and return the results after looking up the value from the categories table. So in the example above, the FIRST row of the original table would return:

alias group_1 group_2
product_a category_a uk
product_a category_a us
product_a category_b uk
product_a category_b us
product_a category_c uk
product_a category_c us
... ... ...

The lookup part is desired, but if that proves to be too complicated, I can live without that part.

TheMaster
  • 45,448
  • 6
  • 62
  • 85
Typhoon101
  • 2,063
  • 8
  • 32
  • 49

1 Answers1

2

Yes, but you won't be happy with the performance.

You can match a comma-separated list against an individual value using MySQL's FIND_IN_SET() function.

select p.alias, a.category as group_1, c.country as group_2 
from myProducts p join myCategories a on find_in_set(a.id, p.group_1) 
join countries c on find_in_set(c.country, p.group_2);

+-----------+------------+---------+
| alias     | group_1    | group_2 |
+-----------+------------+---------+
| product_c | category_a | spain   |
| product_c | category_d | spain   |
| product_a | category_a | uk      |
| product_a | category_b | uk      |
| product_b | category_b | uk      |
| product_a | category_c | uk      |
| product_b | category_d | uk      |
+-----------+------------+---------+

I did create another lookup table countries:

create table countries (country varchar(20) primary key);
insert into countries values ('uk'),('us'),('spain'),('germany');

Caveat: if the comma-separated list has spaces, they will be treated as part of each string in the list, so you want to remove spaces.

select p.alias, a.category as group_1, c.country as group_2 
from myProducts p join myCategories a on find_in_set(a.id, p.group_1) 
join countries c on find_in_set(c.country, replace(p.group_2,' ',''));

+-----------+------------+---------+
| alias     | group_1    | group_2 |
+-----------+------------+---------+
| product_c | category_a | germany |
| product_c | category_d | germany |
| product_c | category_a | spain   |
| product_c | category_d | spain   |
| product_a | category_a | uk      |
| product_a | category_b | uk      |
| product_b | category_b | uk      |
| product_a | category_c | uk      |
| product_b | category_d | uk      |
| product_b | category_b | us      |
| product_b | category_d | us      |
+-----------+------------+---------+

But there's no way to optimize the lookups with indexes if you do this. So every join will be a table-scan. As your tables gets larger, you'll find the performance degrades to the point of being unusable.

The way to optimize this is to avoid using comma-separated lists. Normalize many-to-many relationships into new tables. Then the lookups can use indexes, and you'll avoid the degraded performance, in addition to all the other problems with using comma-separated lists.


Re your comment:

You can create a derived table by listing countries explicitly:

FROM ...
JOIN (
  SELECT 'us' AS country UNION SELECT 'uk' UNION SELECT 'spain' UNION SELECT 'germany'
) AS c

But this is getting pretty ridiculous. You aren't using SQL to any advantage. You might as well just fetch the whole dataset back into your client application and sort it into some data structures in memory.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • Thanks, this is working perfectly for the group_1 column. However, I don't have the ability to create another 'countries' table for group_2. this table is very un-normalized and I don't have write access. Could your answer be modified slightly to split on the comma separated values in group_2? – Typhoon101 Jan 17 '22 at 17:08
  • That works perfectly. Thanks so much for your help. I know it is a crazy scenario. We didn't build this database, we inherited it from a previous company. Believe me, the database schema is one of the better components to this system. We are now rebuilding the application from scratch, but I need to extract data as a one-off reconciliation exercise. Thanks again – Typhoon101 Jan 17 '22 at 23:27