0

I have a TSQL (MSSQL) table which contains record in below format

Id Column1 Column2
1 a/b/c apple/banana/cucumber

I want to split the records in below format

Id Column1 Column2
1 a apple
1 b banana
1 c cucumber

Column1 and Column2 maintain relationship using the "/" separator and related to each other in same sequence.

I tried to split the column with help of CHARINDEX & SUBSTRING but I am not able to maintain relationship between two columns.

Thom A
  • 88,727
  • 11
  • 45
  • 75
Vishal
  • 31
  • 6
  • 2
    You should really be fixing your design and not storing the data in a delimited format at all. You'll need to find a string splitter that returns the ordinal position here; a search on your favourite search engine will give you many examples, I am sure. – Thom A Jan 18 '22 at 15:11
  • you can use split_string to accomplish your desired output. use string_split to put values in a temp table. – Golden Lion Jan 18 '22 at 15:12
  • While asking a question, you need to provide a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example): (1) DDL and sample data population, i.e. CREATE table(s) plus INSERT T-SQL statements. (2) What you need to do, i.e. logic and your code attempt implementation of it in T-SQL. (3) Desired output, based on the sample data in the #1 above. (4) Your SQL Server version (SELECT @@version;). – Yitzhak Khabinsky Jan 18 '22 at 15:36
  • @GoldenLion From [`String_Split`](https://learn.microsoft.com/en-us/sql/t-sql/functions/string-split-transact-sql?view=sql-server-ver15): "The output rows might be in any order. The order is _not_ guaranteed to match the order of the substrings in the input string." I'm not sure what `split_string`, also mentioned in your comment, does. – HABO Jan 18 '22 at 15:41
  • @HABO - It's only available on Azure SQL right now but SPLIT_STRING includes an [ORDER BY ordinal]. You can get the ordinal via `ROW_NUMBER() (OVER ordinal)`. Hopefully they include it with the next non-AzuSQL version of Server. https://learn.microsoft.com/en-us/sql/t-sql/functions/string-split-transact-sql?view=sql-server-ver15#arguments – Alan Burstein Jan 19 '22 at 03:26
  • @AlanBurstein My broken must be browser. Your comment has a link to documentation for `String_Split` as an explanation of `Split_String`. Or confused am I? – HABO Jan 19 '22 at 04:02
  • @habo - correct. But notice that, in the `STRING_SPLIT` documentation, they talk about how you can sort by ordinal, e.g. `ORDER BY ordinal`. In Azure: `ORDER BY ordinal` DOES guarantee ordering. You can also use `ROW_NUMBER() OVER ( ORDER BY ss.ordinal) FROM SPLIT_STRING('a,b,c',',') AS ss`; this would guarantee that you get: 1 for "a", 2 for "b" and 3 for "C". Again, this is brand new (a few months) but is only available in Azure, sadly. SSMS for 2019 already recognizes ORDER BY ordinal as valid SPLIT_STRING syntax but will through an error. – Alan Burstein Jan 19 '22 at 05:14
  • @AlanBurstein `'split_string'` != `'string_split'`. Your `Row_Number` example uses a function that doesn't exist, with or without `ordinal`. – HABO Jan 19 '22 at 14:35
  • @HABO I meant STRING_SPLIT. I make that mistake all the time but SSMS intellisense corrects me. :) – Alan Burstein Jan 19 '22 at 16:40

5 Answers5

3

You can add a function to split strings.

Then cross apply to the split parts of Column1 and Column2.

create table test (
 Id int identity primary key, 
 Column1 varchar(30),
 Column2 varchar(30)
);

insert into test (Column1, Column2) values
('a/b/c', 'apple/banana/cucumber'), 
('d/e/f', 'orange/prune/onion');

(UDF copied from here )

CREATE FUNCTION dbo.fnString_Split
(
    @str    nvarchar(4000), 
    @delim  nchar(1)
)
RETURNS TABLE 
WITH SCHEMABINDING 
AS 
RETURN
(
  WITH RCTE AS (
    SELECT 
      1 AS ordinal
    , ISNULL(NULLIF(CHARINDEX(@delim, @str),0), LEN(@str)) AS pos
    , LEFT(@str, ISNULL(NULLIF(CHARINDEX(@delim, @str),0)-1, LEN(@str))) AS value
    UNION ALL
    SELECT 
      ordinal+1
    , ISNULL(NULLIF(CHARINDEX(@delim, @str, pos+1), 0), LEN(@str))
    , SUBSTRING(@str, pos+1, ISNULL(NULLIF(CHARINDEX(@delim, @str, pos+1),0)-pos-1, LEN(@str)-pos )) 
    FROM RCTE
    WHERE pos < LEN(@str)
  ) 
  SELECT ordinal, value
  FROM RCTE
);
select 
  t.Id
, ca.Column1
, ca.Column2
from test t
cross apply (
  select 
    s1.ordinal
  , s1.value as Column1
  , s2.value as Column2
  from dbo.fnString_Split(t.Column1,'/') as s1
  join dbo.fnString_Split(t.Column2,'/') as s2
    on s1.ordinal = s2.ordinal
) ca;
Id Column1 Column2
1 a apple
1 b banana
1 c cucumber
2 d orange
2 e prune
2 f onion

Demo on db<>fiddle here

LukStorms
  • 28,916
  • 5
  • 31
  • 45
2

Please try the following solution.

It is JSON based, and will work starting from SQL Server 2016 onwards.

SQL

-- DDL and sample data population, start
DECLARE @tbl TABLE (ID INT, ColB varchar(8000), ColC varchar(8000));
INSERT INTO @tbl VALUES
(1,'a/b/c','apple/banana/cucumber');
-- DDL and sample data population, end

DECLARE @separator CHAR(1) = '/';

WITH rs AS 
(
     SELECT *
         , ar1 = '["' + REPLACE(ColB, @separator, '","') + '"]'
         , ar2 = '["' + REPLACE(ColC, @separator, '","') + '"]'
     FROM @tbl
 )
 SELECT ID, ColB.[value] AS [ColB], ColC.[value] AS ColC
 FROM rs
    CROSS APPLY OPENJSON (ar1, N'$') AS ColB
    CROSS APPLY OPENJSON (ar2, N'$') AS ColC
 WHERE ColB.[key] = ColC.[key];

Output

+----+------+----------+
| ID | ColB |   ColC   |
+----+------+----------+
|  1 | a    | apple    |
|  1 | b    | banana   |
|  1 | c    | cucumber |
+----+------+----------+
Yitzhak Khabinsky
  • 18,471
  • 2
  • 15
  • 21
2

I know there are already a few answers and one has been accepted but there are some vital performance factors to consider. If the number of values is always three then (or are always low, say less than 5 or 6) then the Cascading APPLY technique will be fastest by far. This solution assumes there will always be 3 items. It can be easily modified to handle a variable, yet small number of items.

CASCADING APPLY SOLUTION:

DECLARE @table TABLE
(
  SomeId  INT IDENTITY,
  S1 VARCHAR(1000),
  s2 VARCHAR(1000)
);
INSERT @table VALUES ('a/b/c','apple/banana/cucumber'),
                     ('d/d2/f','dog/donkey/fish'),('x/y/z','x-ray/yo-yo/zeta');
SELECT
  SomeId = f.SomeId,
  Col1   = f2.C1,
  Col2   = f2.C2
FROM
(
  SELECT
    t.SomeId, 
    SUBSTRING(t.S1, 1,       c1.P-1),
    SUBSTRING(t.S1, c1.P+1,  c2.P - c1.P-1),
    SUBSTRING(t.S1, c2.P+1,  8000),
    SUBSTRING(t.S2, 1,       c1.P2 - 1),
    SUBSTRING(t.S2, c1.P2+1, c2.P2 - c1.P2-1),
    SUBSTRING(t.S2, c2.P2+1, 8000)
  FROM        @table AS t
  CROSS APPLY (VALUES(CHARINDEX('/',t.S1),CHARINDEX('/',t.S2)))                AS c1(P,P2)
  CROSS APPLY (VALUES(CHARINDEX('/',t.S1,c1.P+1),CHARINDEX('/',t.S2,c1.P2+1))) AS c2(P,P2)
  CROSS APPLY (VALUES(CHARINDEX('/',t.S1,c2.P+1),CHARINDEX('/',t.S2,c2.P2+1))) AS c3(P,P2)
) AS f(SomeId,c1_1,c1_2,c1_3,c2_1,c2_2,c2_3)
CROSS APPLY (VALUES (c1_1, c2_1), (c1_2, c2_2),(c1_3,c2_3)) AS f2(c1,c2);

If you are using SQL Azure you have STRING_SPLIT with the ordinal option.

DECLARE @table TABLE
(
  SomeId  INT IDENTITY,
  S1 VARCHAR(1000),
  s2 VARCHAR(1000)
);
INSERT @table VALUES ('a/b/c','apple/banana/cucumber'),
                     ('d/d2/f','dog/donkey/fish'),('x/y/z','x-ray/yo-yo/zeta');

SELECT TOP(1) WITH TIES
  t.SomeId, t.S1, t.S2, Col1 = split1.[value], Col2 = split2.[value]
FROM        @table                  AS t
CROSS APPLY STRING_SPLIT(t.S1 ,'/') AS split1
CROSS APPLY STRING_SPLIT(t.S2 ,'/') AS split2
ORDER BY ABS(
  ROW_NUMBER() OVER (PARTITION BY t.SomeId, split1.value ORDER BY split1.value)-
  ROW_NUMBER() OVER (PARTITION BY t.SomeId, split2.value ORDER BY split1.value));

^^^ This will only work if items are alphabetical (not practical) BUT, on Azure, you can change the ORDER BY to:

ORDER BY ABS(
  ROW_NUMBER() OVER (PARTITION BY t.SomeId, split1.value ORDER BY split1.ordianal) -
  ROW_NUMBER() OVER (PARTITION BY t.SomeId, split2.value ORDER BY split1.ordianal));

Comparing ALL techniques posted

Now lets compare all solutions posted thus far to understand the dramatic performance differences. I built a basic test harness and ran it, first with 100K rows, then with a million rows. Solutions include:

  • dbo.udf_SplitList by Andy3B
  • dbo.fnString_Split by LukStorms
  • a Recursive CTE - Numbers/tally table solution by DhruvJoshi
  • an improved version of DhruvJoshi's solution with a faster tally table
  • Yitzhak Khabinsky's JSON solution
  • My Cascading APPLY solution

The dbo.udf_SplitList solution is the slowest at 246 seconds for a million rows. Scalar functions are dreadfully slow but the recursion is making things much worse here.

The dbo.fnString_Split solution does much better at 85 seconds. To do better we need to lose the scalar udf's.

DhruvJoshi's recursive CTE solution gets us down to 65 seconds, a 50% improvement. I improved that his solution by rewriting the numbers table to not use recursion; this gets us another 50% speed increase, down to 45 seconds.

Yitzhak Khabinsky's is the first set-based solution; he is leveraging JSON. Here we have a 200%+ performance boost, down to 16 seconds. The Cascading APPLY solution improves on Yitzhak's solution by another 400%+; down to under three seconds.

Watch out for Scalar UDFs, Recursion for counting and loops. Set-based always rules.

IF OBJECT_ID('tempdb..#table') IS NOT NULL DROP TABLE #table;
GO
SELECT TOP(100000)
  SomeId = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),
  S1     = 'A/B/C',
  S2     = REPLACE(LEFT(NEWID(),18),'-','/')
INTO #table
FROM sys.all_columns, sys.all_columns a
GO

PRINT CHAR(10)+'dbo.udf_SplitList -Andy3B'+CHAR(10)+REPLICATE('-',90);
GO
 DECLARE @st DATETIME = GETDATE(), @ID INT, @C1 VARCHAR(1000), @C2 VARCHAR(1000);
 
 SELECT @ID = t.SomeId, @C1 = x.Code, @C2 = x.Value
 FROM    #table AS t
 CROSS APPLY
 (
   SELECT Code  = C.Value,
          Value = V.Value
   FROM    dbo.udf_SplitList(t.S1, '/') AS C
   JOIN    dbo.udf_SplitList(t.S2, '/') AS V ON V.ID = C.ID
 ) AS X;

 PRINT DATEDIFF(MS,@st,GETDATE())
GO 3

PRINT CHAR(10)+'dbo.fnString_Split - LukStorms'+CHAR(10)+REPLICATE('-',90);
GO
 DECLARE @st DATETIME = GETDATE(), @ID INT, @C1 VARCHAR(1000), @C2 VARCHAR(1000);

  select @ID = t.SomeId, @C1 = ca.Column1, @C2 = ca.Column2
  from   #table AS t
  cross apply (
    select 
      s1.ordinal
    , s1.value as Column1
    , s2.value as Column2
    from dbo.fnString_Split(t.S1,'/') as s1
    join dbo.fnString_Split(t.S2,'/') as s2
      on s1.ordinal = s2.ordinal) AS ca;

 PRINT DATEDIFF(MS,@st,GETDATE())
GO 3

PRINT CHAR(10)+'Recursive CTE - TALLY TABLE DhruvJoshi'+CHAR(10)+REPLICATE('-',90);
GO
DECLARE @st DATETIME = GETDATE(), @ID INT, @C1 VARCHAR(1000), @C2 VARCHAR(1000);

;with nums as
(
  select 1 as num union all 
  select num +1 as num
  from   nums 
  where  num <80
)
select 
   @ID = X.SomeId,
   @C1 = substring(X.S1,X.b,X.e-X.b),
   @C2 = substring(Y.S2,Y.b,Y.e-Y.b)
from 
(
  select 
    t.*,
    e=N.Num,
    r=row_number()      over (partition by t.SomeId order by N.num),
    b=isnull(lag(N.num) over (partition by t.SomeId order by N.num),0)+1
  from      #table AS t
  left join nums   AS N 
    on      charindex('/',t.S1+'/',N.num)=N.num 
) AS X
left join
(
  select 
    t.*,
    e= N.Num,
    r=row_number()      over (partition by t.SomeId order by N.num),
    b=isnull(lag(N.num) over (partition by t.SomeId order by N.num),0)+1
  from      #table AS t
  left join nums   AS N 
     on charindex('/', t.S2+'/',N.num)=N.num 
) AS Y
  ON X.SomeId = Y.SomeId and X.r=Y.r;
 
 PRINT DATEDIFF(MS,@st,GETDATE())
GO 3

PRINT CHAR(10)+'TALLY TABLE without Recursion DhruvJoshi'+CHAR(10)+REPLICATE('-',90);
GO
DECLARE @st DATETIME = GETDATE(), @ID INT, @C1 VARCHAR(1000), @C2 VARCHAR(1000);

;with nums as
(
  SELECT num = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
  FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS e1(x),
       (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS e2(x),
       (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS e3(x),
       (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS e4(x)
)
select 
   @ID = X.SomeId,
   @C1 = substring(X.S1,X.b,X.e-X.b),
   @C2 = substring(Y.S2,Y.b,Y.e-Y.b)
from 
(
  select 
    t.*,
    e=N.Num,
    r=row_number()      over (partition by t.SomeId order by N.num),
    b=isnull(lag(N.num) over (partition by t.SomeId order by N.num),0)+1
  from      #table AS t
  left join nums   AS N 
    on      charindex('/',t.S1+'/',N.num)=N.num 
  WHERE n.num <= LEN(t.S1)
) AS X
left join
(
  select 
    t.*,
    e= N.Num,
    r=row_number()      over (partition by t.SomeId order by N.num),
    b=isnull(lag(N.num) over (partition by t.SomeId order by N.num),0)+1
  from      #table AS t
  left join nums   AS N 
     on charindex('/', t.S2+'/',N.num)=N.num
  WHERE n.num <= LEN(t.S2)
) AS Y
  ON X.SomeId = Y.SomeId and X.r=Y.r; 
 PRINT DATEDIFF(MS,@st,GETDATE())
GO 3

PRINT CHAR(10)+'JSON - Yitzhak Khabinsky'+CHAR(10)+REPLICATE('-',90);
GO
 DECLARE @st DATETIME = GETDATE(), @ID INT, @C1 VARCHAR(1000), @C2 VARCHAR(1000);
 DECLARE @separator CHAR(1) = '/';
 
 WITH rs AS 
 (
      SELECT *
          , ar1 = '["' + REPLACE(t.S1, @separator, '","') + '"]'
          , ar2 = '["' + REPLACE(t.S2, @separator, '","') + '"]'
      FROM #table AS t
 )
 SELECT @id = SomeID, @C1 = ColB.[value], @C2 = ColC.[value] 
 FROM        rs                   AS rs
 CROSS APPLY OPENJSON (ar1, N'$') AS ColB
 CROSS APPLY OPENJSON (ar2, N'$') AS ColC
 WHERE       ColB.[key] = ColC.[key];
 
 PRINT DATEDIFF(MS,@st,GETDATE())
GO 3

PRINT CHAR(10)+'CROSS APPLY TECHNIQUE'+CHAR(10)+REPLICATE('-',90);
GO
 DECLARE @st DATETIME = GETDATE(), @ID INT, @C1 VARCHAR(1000), @C2 VARCHAR(1000);
 
 SELECT
    @ID = f.SomeId,
    @C1   = f2.C1,
    @C2   = f2.C2
  FROM
  (
    SELECT
      t.SomeId, 
      SUBSTRING(t.S1, 1,       c1.P-1),
      SUBSTRING(t.S1, c1.P+1,  c2.P - c1.P-1),
      SUBSTRING(t.S1, c2.P+1,  8000),
      SUBSTRING(t.S2, 1,       c1.P2 - 1),
      SUBSTRING(t.S2, c1.P2+1, c2.P2 - c1.P2-1),
      SUBSTRING(t.S2, c2.P2+1,  8000)
    FROM        #table AS t
    CROSS APPLY (VALUES(CHARINDEX('/',t.S1),CHARINDEX('/',t.S2)))                AS c1(P,P2)
    CROSS APPLY (VALUES(CHARINDEX('/',t.S1,c1.P+1),CHARINDEX('/',t.S2,c1.P2+1))) AS c2(P,P2)
    CROSS APPLY (VALUES(CHARINDEX('/',t.S1,c2.P+1),CHARINDEX('/',t.S2,c2.P2+1))) AS c3(P,P2)
  ) AS f(SomeId,c1_1,c1_2,c1_3,c2_1,c2_2,c2_3)
  CROSS APPLY (VALUES (c1_1, c2_1), (c1_2, c2_2),(c1_3,c2_3)) AS f2(c1,c2);
 
 PRINT DATEDIFF(MS,@st,GETDATE())
GO 3

100K row Test Results:

dbo.udf_SplitList -Andy3B
------------------------------------------------------------------------------------------
Beginning execution loop
24593
24566
24530
Batch execution completed 3 times.

dbo.fnString_Split - LukStorms
------------------------------------------------------------------------------------------
Beginning execution loop
8147
8260
8257
Batch execution completed 3 times.

Recursive CTE - TALLY TABLE DhruvJoshi
------------------------------------------------------------------------------------------
Beginning execution loop
6867
6733
6850
Batch execution completed 3 times.

TALLY TABLE without Recursion DhruvJoshi
------------------------------------------------------------------------------------------
Beginning execution loop
4677
4630
4620
Batch execution completed 3 times.

JSON - Yitzhak Khabinsky
------------------------------------------------------------------------------------------
Beginning execution loop
1667
1653
1670
Batch execution completed 3 times.

CROSS APPLY TECHNIQUE
------------------------------------------------------------------------------------------
Beginning execution loop
283
280
284
Batch execution completed 3 times.

1 million row Test Results:

dbo.udf_SplitList -Andy3B
------------------------------------------------------------------------------------------
Beginning execution loop
246057
245296
247017
Batch execution completed 3 times.

dbo.fnString_Split - LukStorms
------------------------------------------------------------------------------------------
Beginning execution loop
85340
83010
83674
Batch execution completed 3 times.

Recursive CTE - TALLY TABLE -DhruvJoshi
------------------------------------------------------------------------------------------
Beginning execution loop
67226
64910
64740
Batch execution completed 3 times.

TALLY TABLE without Recursion DhruvJoshi
------------------------------------------------------------------------------------------
Beginning execution loop
46777
44630
44623
Batch execution completed 3 times.

JSON - Yitzhak Khabinsky
------------------------------------------------------------------------------------------
Beginning execution loop
16710
16830
16520
Batch execution completed 3 times.

CROSS APPLY TECHNIQUE
------------------------------------------------------------------------------------------
Beginning execution loop
2846
2793
2850
Batch execution completed 3 times.

Completion time: 2022-01-18T22:08:50.3264912-06:00
Alan Burstein
  • 7,770
  • 1
  • 15
  • 18
1
  1. First create the function below to split the string.
  2. Then, execute the code that follows the function code.
    -- Function Code
    CREATE FUNCTION [dbo].[udf_SplitList]
        (   
                @InputString    varchar(MAX)    
            ,   @Separator      varchar(1)  
        )
        RETURNS @ValuesList TABLE ( ID int IDENTITY(1,1), Value varchar(MAX))
        AS
            BEGIN
                DECLARE @ListValue NVARCHAR(max)
                SET @InputString = @InputString + @Separator
                WHILE (LEN(@InputString) > 0)
                    BEGIN
                        SELECT @ListValue = SUBSTRING(@InputString , 1, CHARINDEX(@Separator, @InputString) - 1)
                        INSERT INTO @ValuesList
                        SELECT LTRIM(@ListValue)
                        SELECT @InputString = SUBSTRING(@InputString, CHARINDEX(@Separator, @InputString) + 1 , LEN(@InputString) - CHARINDEX(@Separator, @InputString)) 
                END 
                RETURN 
            END
    -- Execution Code
    DECLARE     @YourTable TABLE (ID int, CodeList varchar(MAX), ValueList varchar(MAX));
    INSERT INTO @YourTable  VALUES ( 1, 'a/b/c', 'apple/banana/cucumber');
    SELECT  X.*
    FROM    @YourTable  Y
    CROSS APPLY
        (
            SELECT 
                    Code  = C.Value
                ,   Value = V.Value
            FROM    dbo.udf_SplitList(Y.CodeList , '/') C
            JOIN    dbo.udf_SplitList(Y.ValueList, '/') V ON V.ID = C.ID
        ) X
    ;
LukStorms
  • 28,916
  • 5
  • 31
  • 45
Andy3B
  • 444
  • 2
  • 6
0

Without using function, you can use a tally/numbers table to do the same thing like below

See a working demo here

; with nums as
(
select 1 as num
union all 
select num +1 as num
from 
nums 
where num <80
)
select 
   X.id,
   substring(X.column1,X.b,X.e-X.b),
   substring(Y.column2,Y.b,Y.e-Y.b)
from 
(
    select 
        t.*,
        e=N.Num,
        r=row_number() over(partition by Id order by N.num),
        b=isnull(lag(N.num) over (partition by Id order by N.num),0)+1
     from t
        left join nums N 
            on charindex('/',column1+'/',N.num)=N.num 
)X
left join
(
    select 
        t.*,
        e= N.Num,
        r=row_number() over(partition by Id order by N.num),
        b=isnull(lag(N.num) over (partition by Id order by N.num),0)+1
    from t
        left join nums N 
             on charindex('/', column2+'/',N.num)=N.num 
)Y
       on X.id=Y.id and X.r=Y.r
DhruvJoshi
  • 17,041
  • 6
  • 41
  • 60