6

I have the following table definition

CREATE TABLE _Table 
(
    [Pat] NVARCHAR(8), 
    [Codes] NVARCHAR(50), 
    [C1] NVARCHAR(6), 
    [C2] NVARCHAR(6), 
    [C3] NVARCHAR(6), 
    [C4] NVARCHAR(6), 
    [C5] NVARCHAR(6)
);
GO

INSERT INTO _Table ([Pat], [Codes], [C1], [C2], [C3], [C4], [C5])
VALUES
    ('Pat1', 'U212,Y973,Y982', null, null, null, null, null),
    ('Pat2', 'M653', null, null, null, null, null), 
    ('Pat3', 'U212,Y973,Y983,Z924,Z926', null, null, null, null, null);
GO  

SQL Fiddle here.

Now, I would like to split the codes for each row and populate the Cn columns so we end up with

Pat     Codes                       C1      C2      C3      C4      C5
Pat1    'U212,Y973,Y982'            U212    Y973    Y982    NULL    NULL
Pat2    'M653'                      M653    NULL    NULL    NULL    NULL
Pat3    'U212,Y973,Y983,Z924,Z926'  U212    Y973    Y983    Z924    Z926

I am looking at dynamic SQL but is there a better way...

I have started down the CTE route, but I am weak here. I am essentially looping, removing the first comma separated code and using left to get that code and selected it as C1.

;WITH tmp([Pat], [Codes], [C1], [C2], [C3], [C4], [C5]) AS
(
    SELECT
        Pat,
        STUFF(Codes, 1, CHARINDEX(',', Codes + ','), ''), 
        LEFT(Codes, CHARINDEX(',', Codes + ',') - 1), 
        [C2], 
        [C3], 
        [C4], 
        [C5]
    FROM _Table 
    UNION all
    SELECT
        Pat,
        STUFF(Codes, 1, CHARINDEX(',', Codes + ','), ''), 
        LEFT(Codes, CHARINDEX(',', Codes + ',') - 1), 
        [C2], 
        [C3], 
        [C4], 
        [C5]
    FROM _Table 
    WHERE
        Codes > ''
)
SELECT Pat, Codes,  [C1], [C2], [C3], [C4], [C5] 
FROM tmp 
ORDER BY Pat

This works for one code, but how do I do all 5? Note, in practice this could increase to N codes.

Lukasz Szozda
  • 162,964
  • 23
  • 234
  • 275
MoonKnight
  • 23,214
  • 40
  • 145
  • 277
  • 1
    If it could increase to `N` codes, don't create `N` colums, create `N` rows... That would be the normalised structure, the one ***Relational*** databases are ***designed*** for; automatically extensible and significantly easier to query. https://raresql.com/2012/02/14/split-function-in-sql-server/ – MatBailie May 17 '18 at 15:11
  • 1
    In the first row, why is c4 and c5 null? – UnhandledExcepSean May 17 '18 at 15:18
  • https://stackoverflow.com/questions/29961576/how-can-i-convert-split-function-to-inline-table-valued-udf-in-sql-server?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa – MatBailie May 17 '18 at 15:20
  • @UnhandledExcepSean I have updated the question, this was a typo. – MoonKnight May 17 '18 at 16:10
  • @MatBailie here N is taken to mean some small integer between say 1 and 20. Each Pat can have up to 20 codes in the full system. – MoonKnight May 17 '18 at 16:12
  • @MoonKnight - Doesn't matter, the normal construction would still be to normalise the data such that you have one row per associated element. – MatBailie May 17 '18 at 16:14
  • 1
    Depends on the level of normalization you want - there is no duplication here. Regardless, I did not design the table structure, but have a requirement to perform the operation described above. The links you provided do not help in this regard. Thanks for your time. – MoonKnight May 17 '18 at 16:40
  • The scope of your requirement should be to fix the problem. And not just one time. Future proof it. Why in the world would you continue down a path that is going to continue to be a problem in the future. If this is not some sort of exercise for the sake of exercise, and an actual boss gave this to you...Good luck and godspeed. Otherwise fix it correctly. – user7396598 May 17 '18 at 18:32
  • Use a split function then pivot the results dynamically. – EzLo May 22 '18 at 15:02
  • Are these medical ICD10 & OPCS4 codes? If so that means they can occasionally be more than 4 characters long. – pacreely May 22 '18 at 20:49
  • If these are medical codes and you have the unfortunate task of working with NHS CDS or Spells Converter then I recommend Alan Burstein's initial solution. I've had to work with that data in the past and the CROSS APPLY always worked out best. – pacreely May 22 '18 at 21:23
  • @pacreely AR-DRG codes et al. Mixture of jumbled garbage from an endless supply of different feeder systems by all accounts. – MoonKnight May 23 '18 at 16:05

6 Answers6

6

If I understand the requirement correctly this is extremely simple. No splitting or other type of function, Dynamic SQL, recursive CTEs, PIVOTING or any other skulduggery is necessary.

To perform the "split" you can use CROSS APPLY like so:

SELECT 
  Pat,
  Codes,
  C1 = SUBSTRING(Codes,1,ISNULL(d1.d-1,8000)),
  C2 = SUBSTRING(Codes,d1.d+1, d2.d-d1.d-1),
  C3 = SUBSTRING(Codes,d2.d+1, d3.d-d2.d-1),
  C4 = SUBSTRING(Codes,d3.d+1, d4.d-d3.d-1),
  C5 = SUBSTRING(Codes,d4.d+1, 8000)
FROM _Table
CROSS APPLY (VALUES (NULLIF(CHARINDEX(',',Codes),0)))        d1(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(',',Codes,d1.d+1),0))) d2(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(',',Codes,d2.d+1),0))) d3(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(',',Codes,d3.d+1),0))) d4(d);

Returns

Pat      Codes                         C1    C2     C3    C4    C5    
-------- ----------------------------- ----- ------ ----- ----- ------
Pat1     U212,Y973,Y982                U212  Y973   NULL  NULL  NULL
Pat2     M653                          M653  NULL   NULL  NULL  NULL
Pat3     U212,Y973,Y983,Z924,Z926      U212  Y973   Y983  Z924  Z926

Note the super-simple and utltra-efficient execution plan:

enter image description here

You can simplify this even further if the codes are always four character like so:

SELECT
  Pat,
  Codes,
  C1 = NULLIF(SUBSTRING(Codes,1,4),''),
  C2 = NULLIF(SUBSTRING(Codes,6,4),''),
  C3 = NULLIF(SUBSTRING(Codes,11,4),''),
  C4 = NULLIF(SUBSTRING(Codes,16,4),''),
  C2 = NULLIF(SUBSTRING(Codes,21,4),'')
FROM _Table;

To perform the update you would do this for first solution:

UPDATE _Table
SET 
  C1 = SUBSTRING(Codes,1,ISNULL(d1.d-1,8000)),
  C2 = SUBSTRING(Codes,d1.d+1, d2.d-d1.d-1),
  C3 = SUBSTRING(Codes,d2.d+1, d3.d-d2.d-1),
  C4 = SUBSTRING(Codes,d3.d+1, d4.d-d3.d-1),
  C5 = SUBSTRING(Codes,d4.d+1, 8000)
FROM _Table
CROSS APPLY (VALUES (NULLIF(CHARINDEX(',',Codes),0)))        d1(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(',',Codes,d1.d+1),0))) d2(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(',',Codes,d2.d+1),0))) d3(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(',',Codes,d3.d+1),0))) d4(d);

If, again, the codes are only four characters long then the update is so easy it feels like cheating:

UPDATE _Table
SET C1 = NULLIF(SUBSTRING(Codes,1,4),''),
    C2 = NULLIF(SUBSTRING(Codes,6,4),''),
    C3 = NULLIF(SUBSTRING(Codes,11,4),''),
    C4 = NULLIF(SUBSTRING(Codes,16,4),''),
    C5 = NULLIF(SUBSTRING(Codes,21,4),'');
Alan Burstein
  • 7,770
  • 1
  • 15
  • 18
  • 1
    I learnt a lot from this answer, so thank you very much for it. I awarded the bounty to lad2025 as he was very quick on the draw and with an answer that built on my initial work. Thanks again for your time. – MoonKnight May 24 '18 at 09:54
  • I have since decided to check the performance of both answers I found useful. However, when running this for 20 codes (production situation) - using the SELECT query, I get `There is insufficient system memory in resource pool 'default' to run this query.`. This seems odd as I have the maximum server mem, set at a reasonable value. – MoonKnight Jun 06 '18 at 10:24
2

I have started down the CTE route, but I am weak here. I am essentially looping, removing the first comma separated code and using left to get that code and selected it as C1.

There are many possible' CSV splitters in SQL Server. I just want to extend your idea with cte (recursive version combined with conditional aggregation):

;WITH tmp([Pat], [Codes],x, lvl) AS
(
  SELECT
   Pat,
   Codes = CAST(STUFF(Codes,1,CHARINDEX(',', Codes + ','), '')AS NVARCHAR(MAX)), 
   x = CAST(LEFT(Codes, CHARINDEX(',', Codes + ',') - 1) AS NVARCHAR(MAX)),
   lvl = 1
  FROM _Table
  -- WHERE c1 IS NULL AND c2 IS NULL AND ...
  -- to avoid recalculating the same rows if run many times
  UNION ALL
  SELECT  Pat,
    Codes = STUFF(Codes,1,CHARINDEX(',', Codes + ','), ''), 
    x= LEFT(Codes, CHARINDEX(',', Codes + ',') - 1),
    lvl = lvl+1
  FROM tmp
  WHERE Codes > ''
)
SELECT Pat
    ,c1 = MAX(IIF(lvl=1, x, NULL))
    ,c2 = MAX(IIF(lvl=2, x, NULL))
    ,c3 = MAX(IIF(lvl=3, x, NULL))
    ,c4 = MAX(IIF(lvl=4, x, NULL))
    ,c5 = MAX(IIF(lvl=5, x, NULL))
    -- add more if necessary
FROM tmp 
GROUP BY Pat
-- OPTION (MAXRECURSION 0);

DBFiddle Demo


And UPDATE:

;WITH tmp([Pat], [Codes],x, lvl) AS
(
    SELECT
        Pat,
        Codes=CAST(STUFF(Codes,1,CHARINDEX(',',Codes+','),'')AS NVARCHAR(MAX)), 
        x = CAST(LEFT(Codes, CHARINDEX(',', Codes + ',') - 1) AS NVARCHAR(MAX)),
        lvl = 1
    FROM _Table
    UNION ALL
    SELECT  Pat,
        Codes = STUFF(Codes, 1, CHARINDEX(',', Codes + ','), ''), 
        x= LEFT(Codes, CHARINDEX(',', Codes + ',') - 1),
        lvl = lvl+1
    FROM tmp
    WHERE Codes > ''
), cte2 AS (
SELECT Pat
    ,c1 = MAX(IIF(lvl=1, x, NULL))
    ,c2 = MAX(IIF(lvl=2, x, NULL))
    ,c3 = MAX(IIF(lvl=3, x, NULL))
    ,c4 = MAX(IIF(lvl=4, x, NULL))
    ,c5 = MAX(IIF(lvl=5, x, NULL))
FROM tmp 
GROUP BY Pat
)
UPDATE _Table
SET c1 = c.c1
   ,c2 = c.c2
   ,c3 = c.c3
   ,c4 = c.c4
   ,c5 = c.c5
FROM _Table t
JOIN cte2 c
  ON t.Pat = c.Pat
 OPTION (MAXRECURSION 0);

DBFiddle Demo

Output:

╔══════╦══════════════════════════╦══════╦══════╦══════╦══════╦══════╗
║ Pat  ║          Codes           ║  C1  ║  C2  ║  C3  ║  C4  ║  C5  ║
╠══════╬══════════════════════════╬══════╬══════╬══════╬══════╬══════╣
║ Pat1 ║ U212,Y973,Y982           ║ U212 ║ Y973 ║ Y982 ║ null ║ null ║
║ Pat2 ║ M653                     ║ M653 ║ null ║ null ║ null ║ null ║
║ Pat3 ║ U212,Y973,Y983,Z924,Z926 ║ U212 ║ Y973 ║ Y983 ║ Z924 ║ Z926 ║
╚══════╩══════════════════════════╩══════╩══════╩══════╩══════╩══════╝

Final thought: The correct way is to normalize schema.

Lukasz Szozda
  • 162,964
  • 23
  • 234
  • 275
  • 1
    This looks spot on. I appreciate the final thought, but this table comes from a large feeder system. The structure is the way it is and I can't change it. Where the data _eventually_ goes is into one of my 2nd Normal Form databases - but this is much further down the pathway. Thanks very much for your time. – MoonKnight May 22 '18 at 15:05
1

Here's one possible solution: SQL Fiddle

with cte (PAT, CNum, Indx) as 
(
  select PAT
  , 1
  , 1
  from _table

  union all

  select a.PAT
  , b.CNum + 1
  , charindex(',', a.CODES, b.Indx+1)+1
  from _table a
  inner join cte b
  on b.PAT = a.PAT
  where charindex(',', a.CODES, b.Indx+1) > 0
)
select t.PAT
--, t.CODES --include to see the original codes value
, max(case when c1.CNUM = 1 then substring(t.CODES,c1.INDX,coalesce(c2.INDX,LEN(t.CODES)+2)-c1.INDX-1) end) C1
, max(case when c1.CNUM = 2 then substring(t.CODES,c1.INDX,coalesce(c2.INDX,LEN(t.CODES)+2)-c1.INDX-1) end) C2
, max(case when c1.CNUM = 3 then substring(t.CODES,c1.INDX,coalesce(c2.INDX,LEN(t.CODES)+2)-c1.INDX-1) end) C3
, max(case when c1.CNUM = 4 then substring(t.CODES,c1.INDX,coalesce(c2.INDX,LEN(t.CODES)+2)-c1.INDX-1) end) C4
, max(case when c1.CNUM = 5 then substring(t.CODES,c1.INDX,coalesce(c2.INDX,LEN(t.CODES)+2)-c1.INDX-1) end) C5
from _Table t
left outer join cte c1
on c1.PAT = t.PAT
left outer join cte c2
on c2.PAT = c1.PAT
and c2.CNum = c1.CNum + 1
group by t.PAT
--, t.CODES --include to see the original codes value

This uses a recursive CTE to fetch the starting positions of each of the values in CODES. The first position is taken as 1; subsequent positions are the char index of the next comma plus one (so we get the character after the comma). We record these positions in value INDX.

We also have CNum in our recursive CTE to record which (C#) field the related record will be related to; this is a simple counter for each result returned.

i.e. For the example data on the SQL Fiddle link, the result of our CTE looks like this:

PAT   | CNum    |   Indx
------+---------+-------
Pat1  |    1    |    1
Pat1  |    2    |    6
Pat1  |    3    |   11
Pat1  |    4    |   16
Pat1  |    5    |   21
Pat2  |    1    |    1
Pat3  |    1    |    1
Pat3  |    2    |    6
Pat3  |    3    |   11
Pat3  |    4    |   16
Pat3  |    5    |   21
Pat3  |    6    |   26
Pat3  |    7    |   31
Pat3  |    8    |   36
Pat3  |    9    |   41
Pat3  |   10    |   46
Pat4  |    1    |   1

We then collapse this down using our group by expression; essentially doing a pivot, putting each CNum value against its related column.

The max is just to ensure that we ignore all the null values / taken the only result with a value for the given CNum.

The code case when c1.CNUM = 1 then substring(t.CODES,c1.INDX,coalesce(c2.INDX,LEN(t.CODES)+2)-c1.INDX-1) end performs a substring between the first character of the current item, taking its length up to 1 before the next character; or if there is no next character the length of the CODES string.


Update

Here's an updated version which makes use of the pivot function / all other logic is as above: SQL Fiddle

with cte (PAT, CNum, Indx) as 
(
  select PAT
  , 1
  , 1
  from @table
  where CODES != ''
  and CODES is not null

  union all

  select a.PAT
  , b.CNum + 1
  , charindex(',', a.CODES, b.Indx+1)+1
  from @table a
  inner join cte b
  on b.PAT = a.PAT
  where charindex(',', a.CODES, b.Indx+1) > 0
)
select PAT
, CODES
, [1] C1
, [2] C2
, [3] C3
, [4] C4
, [5] C5
from 
(
  select t.PAT
  , t.CODES
  , c1.CNum CNum1
  , substring(t.CODES,c1.INDX,coalesce(c2.INDX,LEN(t.CODES)+2)-c1.INDX-1) value
  from @table t
  left outer join cte c1
  on c1.PAT = t.PAT
  left outer join cte c2
  on c2.PAT = c1.PAT
  and c2.CNum = c1.CNum + 1
) x
pivot
(
  max(value) 
  for CNum1 in ([1],[2],[3],[4],[5])
) pvt
order by PAT
JohnLBevan
  • 22,735
  • 13
  • 96
  • 178
1

Though I really like Alan Burstein's answer, another option involves using Jeff Moden's DelimitedSplit8K, conditional aggregation and stacking a couple of common table expressions to get the values you want:

;WITH CTE1 AS
(
    SELECT Pat, Codes, ItemNumber, Item
    FROM _Table
    CROSS APPLY [dbo].[DelimitedSplit8K](Codes, ',')
), CTE2 AS
(
    SELECT  Pat, 
            Codes, 
            MAX(CASE WHEN ItemNumber = 1 THEN Item END) As V1, 
            MAX(CASE WHEN ItemNumber = 2 THEN Item END) As V2, 
            MAX(CASE WHEN ItemNumber = 3 THEN Item END) As V3, 
            MAX(CASE WHEN ItemNumber = 4 THEN Item END) As V4, 
            MAX(CASE WHEN ItemNumber = 5 THEN Item END) As V5
    FROM CTE1
    GROUP BY [Pat], [Codes]
)

UPDATE t
SET C1 = V1,
    C2 = V2,
    C3 = V3,
    C4 = V4,
    C5 = V5
FROM _Table t
JOIN CTE2 ON t.Pat = CTE2.Pat

You can see an online sample or rextester.

Zohar Peled
  • 79,642
  • 10
  • 69
  • 121
1

My suggestion uses an updatable CTE and XML as splitter. XML allows to address each fragment by its position:

WITH updatableCTE AS
(
    SELECT *
          ,CAST('<x>' + REPLACE(Codes,',','</x><x>') + '</x>' AS XML) AS CastedToXml
    FROM _Table
)
UPDATE updatableCTE SET  C1=CastedToXml.value('/x[1]','nvarchar(6)')
                        ,C2=CastedToXml.value('/x[2]','nvarchar(6)')  
                        ,C3=CastedToXml.value('/x[3]','nvarchar(6)')  
                        ,C4=CastedToXml.value('/x[4]','nvarchar(6)')  
                        ,C5=CastedToXml.value('/x[5]','nvarchar(6)'); 

SELECT * FROM _Table

This is very easy to scale to any needed number of fragments.

Shnugo
  • 66,100
  • 9
  • 53
  • 114
0

What MatBailie is trying to get you to see is that having a comma separated list as a column, or having C1 - C(n) columns is a bad idea.

Better to make a second table. It will have a foreign key to the parent table (_Table in your example).

To do this you need to do a couple of things.

First, add an identity column to _Table. This will give you a unique key to use in other tables. Identity is an easy solution without knowing the full scope of your data. You could make it a different, unique column.

Second make a child table, for example PatCodes. It's columns should included:

Id int not null (identity)
_TableId int not null (fk to _Table)
Code varchar(xx) not null 

(optional that might be nice to have)
Description varchar(xxx) null 
DateUpdated datetime null
Active bit not null default 1 

In this way you can have zero-to-many codes for each entry in _Table. You will not need to know the maximum (which could even change in the future) number of Cx columns to add to _Table. You will not be flooded with a large number of NULL column values in _Table. You have a ton of flexibility for managing updates to changing codes for a given _Table entry. I could go on and on about the benefits of normalizing this correctly from the start.

Changing your code to populate this table instead of an unknown number of columns on _Table is also easy to do.

Just do this now, before you end up with a ton of nasty data in the database and/or an application with loads of clutter around your _Table objects, it will save you much time and effort.

user7396598
  • 1,269
  • 9
  • 6
  • 2
    Thank you for taking the time to answer, but I cannot be any clearer than I have in the comments above - _this table structure is not my property_. I appreciate normalizing here, is the best way to go, but I don't have the ability to change the table above. I have merely been asked, by a consultant, who is working with a stubborn client, if I can help him extract the codes from a comma separated list in the way described above. That is all. – MoonKnight May 18 '18 at 07:22