How to split a comma delimited column into rows along with a non-unique ID column through SQL

Question

I would love some advice on how to accomplish this task. How do I create a sql statement to turn the table into the expected output shown? Would just appreciate a point to the right direction!

Is it possible to do this with only read permission? So no ability to create a function or extra tables. I have SQL server 2012, but can upgrade to 2016.

Edit: I'm really sorry for the previous bad example for the ID column. The ID column is now a unique primary key. Thank you for all the advice and help.

Table:
ID      Value
100     aaa, bbb, ccc
101     aaa, bbb, ccc
102     aaa, bbb, ccc
103     abc, bcb, zzz
104     abc, bcb, zzz
105     abc, bcb, zzz


Expected:
ID      Value
100     aaa
101     bbb
102     ccc
103     abc
104     bcb
105     zzz

Are you sure your table data is correct? It would seem that there would only be one row with each `id` value and then you would want to parse the `value` column into separate rows. — daShier, Aug 15 '19 at 01:26
Parsing into three columns is doable. Just search for commas with nested calls to `charindex()`. Use `cross apply` and `values` to get multiple rows. — shawnt00, Aug 15 '19 at 01:38
Possible duplicate of [T-SQL split string](https://stackoverflow.com/questions/10914576/t-sql-split-string) — Squirrel, Aug 15 '19 at 05:59

Jim Jimson · Accepted Answer · 2019-08-15T10:45:13.640

0

For this one you're going to need Jeff Moden's Dilimited 8K Split function which you can find here. Once you've installed that function (which you'll use many times until SQL Server 2016 when SPLIT_STRING comes) you can use the below code to solve your problem:

DECLARE @t TABLE
(
Id INTEGER,
Val VARCHAR(20)
);

INSERT INTO @t VALUES
(1,'aaa,bbb,ccc'),
(1,'aaa,bbb,ccc'),
(1,'aaa,bbb,ccc'),
(2,'abc,bcb,zzz'),
(2,'abc,bcb,zzz'),
(2,'abc,bcb,zzz');

SELECT DISTINCT t.Id, item, ItemNumber
FROM @t t
CROSS APPLY [dbo].[DelimitedSplit8K](t.Val, ',');

--results
Id          item     RowNumber
----------- -------- --------------------
1           aaa      1
1           bbb      2
1           ccc      3
2           abc      1
2           bcb      2
2           zzz      3

edited Aug 15 '19 at 10:45

answered Aug 15 '19 at 06:08

Jim Jimson

2,368
3
17
40

built in `string-_split` was introduced in 2016, not 2017 version. Also, there are other ways to split a string prior to 2016 version. Read Aaron Bertrand's [Split strings the right way – or the next best way](https://sqlperformance.com/2012/07/t-sql-queries/split-strings) for details. – Zohar Peled Aug 15 '19 at 09:01
And one more thing - you are using `row_number()` with the wrong `order by` - you should be using `ItemNumber` (second column returned from Moden's function) in the `order by` clause, otherwise you'll get an arbitrary row_number that might or might not be correct. – Zohar Peled Aug 15 '19 at 09:09
I've edited my answer to show the correct version SPLIT_STRING came out in. On the id front, it produces the correct result given the input. The ids are the same with the query I've written. Using the ItemNumber literally makes no difference here, and your comment only makes things more confusing. – Jim Jimson Aug 15 '19 at 09:19
`PARTITION BY Id ORDER BY id` means you are ordering by a constant value - which is as arbitrary as no order by at all. You might be getting the current result in your tests, but nothing guarantees that it will be consistent - which means that any time you run the query you risk having the wrong order. – Zohar Peled Aug 15 '19 at 09:24
1

I'm sorry, English is not my native language and I'm not familiar with the phrase "come across as a burk". Anyway, glad we could agree. – Zohar Peled Aug 15 '19 at 11:15

score 0 · Answer 2 · answered Aug 15 '19 at 15:47

To improve on Jim's answer above I would suggest that we use remove the duplicates before performing the split if possible. This will result in a much smaller number of duplicates to filter. Using this sample data...

CREATE TABLE #t
(
  Id  INT,
  Val VARCHAR(20)
);

INSERT INTO #t VALUES
(1,'aaa,bbb,ccc'),
(1,'aaa,bbb,ccc'),
(1,'aaa,bbb,ccc'),
(2,'abc,bcb,zzz'),
(2,'abc,bcb,zzz'),
(2,'abc,bcb,zzz');

... we can compare execution plans to see what I mean. Run with "Include Actual Execution Plan" turned on:

-- Remove Duplicates BEFORE the split
SELECT t.Id, s.ItemNumber, s.item
FROM (SELECT DISTINCT t.* FROM #t AS t) AS t
CROSS APPLY [dbo].[DelimitedSplit8K](t.Val, ',') AS s;

-- Remove Duplicates AFTER the split
SELECT DISTINCT t.Id, s.ItemNumber, s.item
FROM #t AS t
CROSS APPLY [dbo].[DelimitedSplit8K](t.Val, ',') AS s;

Now the execution plan below. With this sample data, removing duplicates BEFORE the split results in 6 rows in the inner loop join and merge join operators, and only 2 rows getting sorted. Removing duplicates AFTER the split results in 18 rows in the Loop and Merge join operators and 6 rows getting sorted.

How to split a comma delimited column into rows along with a non-unique ID column through SQL

2 Answers2