0

To get the list of distinct values from table1 and column1 is as easy as doing this:

SELECT distinct(column1) 
FROM table1

However, I inherited (unfortunately) a database where column1 contains values separated by a comma

column1
--------
row 1: name1,name2
row 2: name2,name3
row 3: name4,name1,name3

I need to get the list of distinct values from column1, so it looks like this:

column1
--------
name1
name2
name3
name4

Any ideas?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
jjj
  • 2,594
  • 7
  • 36
  • 57
  • please read https://stackoverflow.com/questions/3653462/is-storing-a-delimited-list-in-a-database-column-really-that-bad mysql can't do this naturally you have to use functions or stored procedures for that – nbk Oct 04 '20 at 12:37
  • And note that DISTINCT is not a function – Strawberry Oct 04 '20 at 13:03
  • Please, please, read about database normalization. And please avoid comma-separated values in columns whenever possible. They systematically defeat SQL query optimizations. – O. Jones Oct 04 '20 at 13:06
  • 1
    please please stop talking about DB normalization, I can't fix problems someone else created... plus, someone already posted a very elegant solution to this issue – jjj Oct 04 '20 at 13:07

3 Answers3

3

A generic method uses a recursive query (available in MySQL 8.0 only):

with recursive 
    data as (select concat(column1, ',') rest from mytable),
    words as (
        select substring(rest, 1, locate(',', rest) - 1) word, substring(rest, locate(',', rest) + 1) rest
        from data
        union all
        select substring(rest, 1, locate(',', rest) - 1) word, substring(rest, locate(',', rest) + 1) rest
        from words
        where locate(',', rest) > 0
)
select distinct word from words order by word

Demo on DB Fiddle

Sample data:

| column1           |
| :---------------- |
| name1,name2       |
| name2,name3       |
| name4,name1,name3 |

Results:

| word  |
| :---- |
| name1 |
| name2 |
| name3 |
| name4 |
GMB
  • 216,147
  • 25
  • 84
  • 135
  • omg, such an elegant solution and it worked like a charm too... thank you – jjj Oct 04 '20 at 13:03
  • on my server, it took about 10s to extract and sort distinct values from 160k rows...very livable,...now that I have those values, I can easily improve my app – jjj Oct 04 '20 at 13:11
1

You have to split them apart. If you have at most three names in the column, then one method is:

select substring_index(column1, ',', 1) as name
from t
union -- on purpose to remove duplicates
select substring_index(substring_index(column1, ',', 2), ',', -1) as name
from t
where name like '%,%'
union -- on purpose to remove duplicates
select substring_index(substring_index(column1, ',', 3), ',', -1) as name
from t
where name like '%,%,%';
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

You can't. Your database does not adhere to the the first principle in designing normalized databases :- Atomicity. It says to store one and only one attribute in a column and yet you have so many. You need to retrieve the entire columns value, split and de-dupe them from your application. SQL cannot do this for you.

What you really need to do here is to have a seperate NAMES table and apply DISTINCT on name column after filtering relevant rows.

Nilesh PS
  • 356
  • 3
  • 8
  • It's not my database, so whomever not adhered to normalized principles, it wasn't me. In any case, I still have a problem at hand I need to solve in some way. Thanks for your answer though. – jjj Oct 04 '20 at 12:38
  • A simple "no" would not have helped a reader understand the problem fully hence I had to explain what's wrong here. I have also mentioned this can only be done by the application. I don't understand why the answer was still downvoted. – Nilesh PS Oct 04 '20 at 12:42