40

I seem to come against this problem a lot, where I have data that's formatted like this:

+----+----------------------+
| id | colors               |
+----+----------------------+
| 1  | Red,Green,Blue       |
| 2  | Orangered,Periwinkle |
+----+----------------------+

but I want it formatted like this:

+----+------------+
| id | colors     |
+----+------------+
| 1  | Red        |
| 1  | Green      |
| 1  | Blue       |
| 2  | Orangered  |
| 2  | Periwinkle |
+----+------------+

Is there a good way to do this? What is this kind of operation even called?

Steve Chambers
  • 37,270
  • 24
  • 156
  • 208
Jason Hamje
  • 511
  • 1
  • 5
  • 15

6 Answers6

26

You could use a query like this:

SELECT
  id,
  SUBSTRING_INDEX(SUBSTRING_INDEX(colors, ',', n.digit+1), ',', -1) color
FROM
  colors
  INNER JOIN
  (SELECT 0 digit UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3) n
  ON LENGTH(REPLACE(colors, ',' , '')) <= LENGTH(colors)-n.digit
ORDER BY
  id,
  n.digit

Please see fiddle here. Please notice that this query will support up to 4 colors for every row, you should update your subquery to return more than 4 numbers (or you should use a table that contains 10 or 100 numbers).

fthiella
  • 48,073
  • 15
  • 90
  • 106
  • 1
    This isn't quite what I'm looking for, I was more looking for something that can handle N rows per id. Thanks though :) – Jason Hamje Jun 26 '13 at 17:25
  • 1
    @JasonHamje if you need to use a query and not a stored procedure, there's no other way :) – fthiella Jun 27 '13 at 17:55
  • Thanks a ton. Used over [Here](http://stackoverflow.com/a/39559794) (Edit2 chunk) and gave attribution :p – Drew Sep 18 '16 at 20:03
  • @Drew you're welcome! thanks to you for the attribution! ;) – fthiella Sep 19 '16 at 06:52
  • Nice answer. In the general case, this method is very powerful if combined with the technique from [this answer](https://stackoverflow.com/questions/304461#2652051) for generating a long sequence of numbers. – Steve Chambers May 20 '18 at 17:58
  • i've total of 30 words by comma seperater, by using the above code, it is not showing all the records, instead showing 3 to 4 words only – Phoenix Aug 10 '19 at 07:14
  • @Phoenix yes, you should add more numbers to the `select 0 select 1 select 2 etc.` to support more than 4 words .. or you should use a join to a numbers table – fthiella Aug 23 '19 at 08:20
14

I think it is what you need (stored procedure) : Mysql split column string into rows

DELIMITER $$

DROP PROCEDURE IF EXISTS explode_table $$
CREATE PROCEDURE explode_table(bound VARCHAR(255))

BEGIN

DECLARE id INT DEFAULT 0;
DECLARE value TEXT;
DECLARE occurance INT DEFAULT 0;
DECLARE i INT DEFAULT 0;
DECLARE splitted_value INT;
DECLARE done INT DEFAULT 0;
DECLARE cur1 CURSOR FOR SELECT table1.id, table1.value
                                     FROM table1
                                     WHERE table1.value != '';
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;

DROP TEMPORARY TABLE IF EXISTS table2;
CREATE TEMPORARY TABLE table2(
`id` INT NOT NULL,
`value` VARCHAR(255) NOT NULL
) ENGINE=Memory;

OPEN cur1;
  read_loop: LOOP
    FETCH cur1 INTO id, value;
    IF done THEN
      LEAVE read_loop;
    END IF;

    SET occurance = (SELECT LENGTH(value)
                             - LENGTH(REPLACE(value, bound, ''))
                             +1);
    SET i=1;
    WHILE i <= occurance DO
      SET splitted_value =
      (SELECT REPLACE(SUBSTRING(SUBSTRING_INDEX(value, bound, i),
      LENGTH(SUBSTRING_INDEX(value, bound, i - 1)) + 1), ',', ''));

      INSERT INTO table2 VALUES (id, splitted_value);
      SET i = i + 1;

    END WHILE;
  END LOOP;

  SELECT * FROM table2;
 CLOSE cur1;
 END; $$
kmas
  • 6,401
  • 13
  • 40
  • 62
2

This saved me many hours! Taking it a step further: On a typical implementation there would in all likelyhood be a table that enumerates the colours against an identitying key, color_list. A new colour can be added to the implementation without having to modify the query and the potentially endless union -clause can be avoided altogether by changing the query to this:

SELECT id,
  SUBSTRING_INDEX(SUBSTRING_INDEX(colors, ',', n.digit+1), ',', -1) color
FROM
  colors
  INNER JOIN
  (select id as digit from color_list) n
  ON LENGTH(REPLACE(colors, ',' , '')) <= LENGTH(colors)-n.digit
ORDER BY id, n.digit;

It is important that the Ids in table color_list remain sequential, however.

gerrit_hoekstra
  • 509
  • 6
  • 8
2

No need for a stored procedure. A CTE is enough:

CREATE TABLE colors(id INT,colors TEXT);
INSERT INTO colors VALUES (1, 'Red,Green,Blue'), (2, 'Orangered,Periwinkle');

WITH RECURSIVE
  unwound AS (
    SELECT *
      FROM colors
    UNION ALL
    SELECT id, regexp_replace(colors, '^[^,]*,', '') colors
      FROM unwound
      WHERE colors LIKE '%,%'
  )
  SELECT id, regexp_replace(colors, ',.*', '') colors
    FROM unwound
    ORDER BY id
;
+------+------------+
| id   | colors     |
+------+------------+
|    1 | Red        |
|    1 | Green      |
|    1 | Blue       |
|    2 | Orangered  |
|    2 | Periwinkle |
+------+------------+
JoL
  • 1,017
  • 10
  • 15
  • If only this existed in 2013! So cool. I don't work with MySQL very often anymore but if I do I'll definitely remember to check this out. – Jason Hamje May 29 '21 at 01:00
  • 1
    @JasonHamje It's not MySQL/MariaDB-specific. Same code works with PostgreSQL. And if one loads an extension to add the function `regexp_replace`, it can also be run on SQLite. – JoL May 29 '21 at 05:47
0

notice this can be done without creating a temporary table

select id, substring_index(substring_index(genre, ',', n), ',', -1) as genre
from my_table
join 
(SELECT @row := @row + 1 as n FROM 
(select 0 union all select 1 union all select 3 union all select 4 union all select 5 union all select 6 union all select 6 union all select 7 union all select 8 union all select 9) t,
(SELECT @row:=0) r) as numbers
  on char_length(genre) 
    - char_length(replace(genre, ',', ''))  >= n - 1
yael alfasi
  • 692
  • 1
  • 4
  • 13
-1

if delimiter is part of data but embedded by double quotes then how can we split it.

Example first,"second,s",third

it should come as first second,s third

sailesh
  • 1
  • 3
  • A little late looking through this.. but why not just remove the quote by using replace then do what the answer say? – Harry Jan 08 '19 at 03:06