2

I have a table like this:

account   |   check1          |   check2
1         |   100]200]300     |   101]209]305
2         |   401]502         |   404]511
3         |   600             |   601

I want to separate the records into something like this:

account   |   check1     |   check2
1         |   100        |   101
1         |   200        |   209
1         |   300        |   305
2         |   401        |   404
2         |   502        |   511
.         |     .        |    .
.         |     .        |    .
.         |     .        |    .

How do I do this using SQL server only?

Thanks,

  • 1
    Start by looking at an implementation of split string function: http://stackoverflow.com/questions/314824 This will most likely be within loop and you need a 2nd loop for the split function results. Populate a temp table. You probably want to do this in a stored proc. – Paul Sasik Jun 06 '12 at 03:19
  • On a side note, it looks like you're making an effort towards normalization. You may also want to consider making another column called `CheckNumber` where, instead of having a column for each check, you use a row for each check. – Peter Majeed Jun 06 '12 at 04:16
  • This is a sample. The real data is way that complicated and indeed having the different and sensible culomn names. Multiple values store at one field is common in our database. – user1438730 Jun 06 '12 at 20:01
  • possible duplicate of [Convert Comma Separated column value to rows](http://stackoverflow.com/questions/13873701/convert-comma-separated-column-value-to-rows) – mmmmmm Oct 30 '13 at 20:09

2 Answers2

5

First, you need a split function that can allow you to determine order within the result. This is a multi-statement TVF which uses an IDENTITY column

CREATE FUNCTION dbo.SplitStrings
(
    @List       NVARCHAR(MAX),
    @Delimiter  NVARCHAR(255)
)
RETURNS @t TABLE(ID INT IDENTITY(1,1), Item INT)
AS
BEGIN
    INSERT @t(Item) SELECT SUBSTRING(@List, Number, 
        CHARINDEX(@Delimiter, @List + @Delimiter, Number) - Number)
    FROM (SELECT ROW_NUMBER() OVER (ORDER BY [object_id])
        FROM sys.all_objects) AS n(Number)
    WHERE Number <= CONVERT(INT, LEN(@List))
        AND SUBSTRING(@Delimiter + @List, Number, 1) = @Delimiter
    ORDER BY Number OPTION (MAXDOP 1);

    RETURN;
END
GO

(If you have a Numbers table, you can use that instead of the subquery, and this will also allow you to add WITH SCHEMABINDING to the function's definition, which provides potential performance benefits.)

With the function in place, here is sample usage given the data you've provided and desired results:

DECLARE @x TABLE(account INT, check1 NVARCHAR(1000), check2 NVARCHAR(1000));

INSERT @x SELECT 1, '100]200]300','101]209]305'
UNION ALL SELECT 2, '401]502','404]511'
UNION ALL SELECT 3, '600','601'
UNION ALL SELECT 4, '205]104','304]701'; -- I added this sanity check

SELECT account, check1 = s1.Item, check2 = s2.Item
FROM @x AS x
CROSS APPLY dbo.SplitStrings(x.check1, ']') AS s1
CROSS APPLY dbo.SplitStrings(x.check2, ']') AS s2
WHERE s1.ID = s2.ID
ORDER BY account, s1.ID;

Results:

account  check1  check2
-------  ------  ------
1        100     101
1        200     209
1        300     305
2        401     404
2        502     511
3        600     601
4        205     304
4        104     701

This assumes that you have some kind of validation / enforcement that corresponding values in check1 and check2 columns will always have the same number of values. It also assumes any check1 / check2 value will not exceed about 7,000 characters (again a Numbers table can help make that more flexible).

EDIT

After AndriyM's comments I wanted to come back and re-visit this, mostly to supply a version of the above function which works without using a multi-statement TVF. This uses Andriy's idea ROW_NUMBER() could be used.

CREATE FUNCTION dbo.SplitStrings
(
    @List       NVARCHAR(MAX),
    @Delimiter  NVARCHAR(255)
)
RETURNS TABLE
AS
    RETURN (SELECT Number = ROW_NUMBER() OVER (ORDER BY Number),
        Item FROM (SELECT Number, Item = LTRIM(RTRIM(SUBSTRING(@List, Number, 
        CHARINDEX(@Delimiter, @List + @Delimiter, Number) - Number)))
    FROM (SELECT ROW_NUMBER() OVER (ORDER BY [object_id])
        FROM sys.all_objects) AS n(Number)
    WHERE Number <= CONVERT(INT, LEN(@List))
        AND SUBSTRING(@Delimiter + @List, Number, 1) = @Delimiter
    ) AS y);
GO
Aaron Bertrand
  • 272,866
  • 37
  • 466
  • 490
  • Your `SplitStrings` thing is pulling and inserting data in one go, which means you could easily rewrite it as an inline TVF, yet you chose a multi-statement one. Could you please explain why? (I mean, if that was not just an arbitrary decision.) I'm asking this because since recently I've always been hearing that inline TVFs should generally be preferred to multi-statement TVFs (where the former are possible, naturally, which is the case here, I believe). – Andriy M Jun 09 '12 at 20:37
  • @AndriyM can you show an inline TVF that doesn't use an IDENTITY column but can guarantee the order of results? The outer query needs some way to join 100 to 101, 200 to 209, etc. I think it can be done as an inline TVF and "work" but I don't think you can guarantee the output order (this is why I added a row where the numbers within the strings weren't both in ascending order). I'd be happy to be proven wrong, but it's going to take more than a comment. :-) (e.g. add a better answer if you have one...) – Aaron Bertrand Jun 09 '12 at 20:51
  • I missed the IDENTITY bit, sorry. I think, you could replace that with an outer select with `ROW_NUMBER() OVER (ORDER BY Number) AS ID`, though. But I don't know if that would be any better. Anyway, I've got the answer to my question, thanks. :) – Andriy M Jun 09 '12 at 21:03
  • @AndriyM since I had to answer a similar problem for a different question, I've updated the answer to include an inline TVF version. Thanks for keeping me honest. – Aaron Bertrand Jun 11 '12 at 19:54
0

@Aaron Bertrand for the records with null value in the second column 'check2' like this:

Account | Check1      | Check2

001     | 100]200     | ]

002     | 300]400     | Null

003     | 500]600]700 | ]]

your function doesn't return the value like this:

Account | Check1      | Check2

001     | 100         | 

001     | 200         | 

002     | 300         | Null

002     | 400         | Null

003     | 500         |

003     | 600         | 

003     | 700         |

How to improve your function to deal with null value or empty string after the last delimiter?

hoyomi
  • 277
  • 1
  • 6
  • 19