I know there are already a few answers and one has been accepted but there are some vital performance factors to consider. If the number of values is always three then (or are always low, say less than 5 or 6) then the Cascading APPLY technique will be fastest by far. This solution assumes there will always be 3 items. It can be easily modified to handle a variable, yet small number of items.
CASCADING APPLY SOLUTION:
DECLARE @table TABLE
(
SomeId INT IDENTITY,
S1 VARCHAR(1000),
s2 VARCHAR(1000)
);
INSERT @table VALUES ('a/b/c','apple/banana/cucumber'),
('d/d2/f','dog/donkey/fish'),('x/y/z','x-ray/yo-yo/zeta');
SELECT
SomeId = f.SomeId,
Col1 = f2.C1,
Col2 = f2.C2
FROM
(
SELECT
t.SomeId,
SUBSTRING(t.S1, 1, c1.P-1),
SUBSTRING(t.S1, c1.P+1, c2.P - c1.P-1),
SUBSTRING(t.S1, c2.P+1, 8000),
SUBSTRING(t.S2, 1, c1.P2 - 1),
SUBSTRING(t.S2, c1.P2+1, c2.P2 - c1.P2-1),
SUBSTRING(t.S2, c2.P2+1, 8000)
FROM @table AS t
CROSS APPLY (VALUES(CHARINDEX('/',t.S1),CHARINDEX('/',t.S2))) AS c1(P,P2)
CROSS APPLY (VALUES(CHARINDEX('/',t.S1,c1.P+1),CHARINDEX('/',t.S2,c1.P2+1))) AS c2(P,P2)
CROSS APPLY (VALUES(CHARINDEX('/',t.S1,c2.P+1),CHARINDEX('/',t.S2,c2.P2+1))) AS c3(P,P2)
) AS f(SomeId,c1_1,c1_2,c1_3,c2_1,c2_2,c2_3)
CROSS APPLY (VALUES (c1_1, c2_1), (c1_2, c2_2),(c1_3,c2_3)) AS f2(c1,c2);
If you are using SQL Azure you have STRING_SPLIT with the ordinal option.
DECLARE @table TABLE
(
SomeId INT IDENTITY,
S1 VARCHAR(1000),
s2 VARCHAR(1000)
);
INSERT @table VALUES ('a/b/c','apple/banana/cucumber'),
('d/d2/f','dog/donkey/fish'),('x/y/z','x-ray/yo-yo/zeta');
SELECT TOP(1) WITH TIES
t.SomeId, t.S1, t.S2, Col1 = split1.[value], Col2 = split2.[value]
FROM @table AS t
CROSS APPLY STRING_SPLIT(t.S1 ,'/') AS split1
CROSS APPLY STRING_SPLIT(t.S2 ,'/') AS split2
ORDER BY ABS(
ROW_NUMBER() OVER (PARTITION BY t.SomeId, split1.value ORDER BY split1.value)-
ROW_NUMBER() OVER (PARTITION BY t.SomeId, split2.value ORDER BY split1.value));
^^^ This will only work if items are alphabetical (not practical) BUT, on Azure, you can change the ORDER BY to:
ORDER BY ABS(
ROW_NUMBER() OVER (PARTITION BY t.SomeId, split1.value ORDER BY split1.ordianal) -
ROW_NUMBER() OVER (PARTITION BY t.SomeId, split2.value ORDER BY split1.ordianal));
Comparing ALL techniques posted
Now lets compare all solutions posted thus far to understand the dramatic performance differences. I built a basic test harness and ran it, first with 100K rows, then with a million rows. Solutions include:
- dbo.udf_SplitList by Andy3B
- dbo.fnString_Split by LukStorms
- a Recursive CTE - Numbers/tally table solution by DhruvJoshi
- an improved version of DhruvJoshi's solution with a faster tally table
- Yitzhak Khabinsky's JSON solution
- My Cascading APPLY solution
The dbo.udf_SplitList solution is the slowest at 246 seconds for a million rows. Scalar functions are dreadfully slow but the recursion is making things much worse here.
The dbo.fnString_Split solution does much better at 85 seconds. To do better we need to lose the scalar udf's.
DhruvJoshi's recursive CTE solution gets us down to 65 seconds, a 50% improvement. I improved that his solution by rewriting the numbers table to not use recursion; this gets us another 50% speed increase, down to 45 seconds.
Yitzhak Khabinsky's is the first set-based solution; he is leveraging JSON. Here we have a 200%+ performance boost, down to 16 seconds. The Cascading APPLY solution improves on Yitzhak's solution by another 400%+; down to under three seconds.
Watch out for Scalar UDFs, Recursion for counting and loops. Set-based always rules.
IF OBJECT_ID('tempdb..#table') IS NOT NULL DROP TABLE #table;
GO
SELECT TOP(100000)
SomeId = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),
S1 = 'A/B/C',
S2 = REPLACE(LEFT(NEWID(),18),'-','/')
INTO #table
FROM sys.all_columns, sys.all_columns a
GO
PRINT CHAR(10)+'dbo.udf_SplitList -Andy3B'+CHAR(10)+REPLICATE('-',90);
GO
DECLARE @st DATETIME = GETDATE(), @ID INT, @C1 VARCHAR(1000), @C2 VARCHAR(1000);
SELECT @ID = t.SomeId, @C1 = x.Code, @C2 = x.Value
FROM #table AS t
CROSS APPLY
(
SELECT Code = C.Value,
Value = V.Value
FROM dbo.udf_SplitList(t.S1, '/') AS C
JOIN dbo.udf_SplitList(t.S2, '/') AS V ON V.ID = C.ID
) AS X;
PRINT DATEDIFF(MS,@st,GETDATE())
GO 3
PRINT CHAR(10)+'dbo.fnString_Split - LukStorms'+CHAR(10)+REPLICATE('-',90);
GO
DECLARE @st DATETIME = GETDATE(), @ID INT, @C1 VARCHAR(1000), @C2 VARCHAR(1000);
select @ID = t.SomeId, @C1 = ca.Column1, @C2 = ca.Column2
from #table AS t
cross apply (
select
s1.ordinal
, s1.value as Column1
, s2.value as Column2
from dbo.fnString_Split(t.S1,'/') as s1
join dbo.fnString_Split(t.S2,'/') as s2
on s1.ordinal = s2.ordinal) AS ca;
PRINT DATEDIFF(MS,@st,GETDATE())
GO 3
PRINT CHAR(10)+'Recursive CTE - TALLY TABLE DhruvJoshi'+CHAR(10)+REPLICATE('-',90);
GO
DECLARE @st DATETIME = GETDATE(), @ID INT, @C1 VARCHAR(1000), @C2 VARCHAR(1000);
;with nums as
(
select 1 as num union all
select num +1 as num
from nums
where num <80
)
select
@ID = X.SomeId,
@C1 = substring(X.S1,X.b,X.e-X.b),
@C2 = substring(Y.S2,Y.b,Y.e-Y.b)
from
(
select
t.*,
e=N.Num,
r=row_number() over (partition by t.SomeId order by N.num),
b=isnull(lag(N.num) over (partition by t.SomeId order by N.num),0)+1
from #table AS t
left join nums AS N
on charindex('/',t.S1+'/',N.num)=N.num
) AS X
left join
(
select
t.*,
e= N.Num,
r=row_number() over (partition by t.SomeId order by N.num),
b=isnull(lag(N.num) over (partition by t.SomeId order by N.num),0)+1
from #table AS t
left join nums AS N
on charindex('/', t.S2+'/',N.num)=N.num
) AS Y
ON X.SomeId = Y.SomeId and X.r=Y.r;
PRINT DATEDIFF(MS,@st,GETDATE())
GO 3
PRINT CHAR(10)+'TALLY TABLE without Recursion DhruvJoshi'+CHAR(10)+REPLICATE('-',90);
GO
DECLARE @st DATETIME = GETDATE(), @ID INT, @C1 VARCHAR(1000), @C2 VARCHAR(1000);
;with nums as
(
SELECT num = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS e1(x),
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS e2(x),
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS e3(x),
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS e4(x)
)
select
@ID = X.SomeId,
@C1 = substring(X.S1,X.b,X.e-X.b),
@C2 = substring(Y.S2,Y.b,Y.e-Y.b)
from
(
select
t.*,
e=N.Num,
r=row_number() over (partition by t.SomeId order by N.num),
b=isnull(lag(N.num) over (partition by t.SomeId order by N.num),0)+1
from #table AS t
left join nums AS N
on charindex('/',t.S1+'/',N.num)=N.num
WHERE n.num <= LEN(t.S1)
) AS X
left join
(
select
t.*,
e= N.Num,
r=row_number() over (partition by t.SomeId order by N.num),
b=isnull(lag(N.num) over (partition by t.SomeId order by N.num),0)+1
from #table AS t
left join nums AS N
on charindex('/', t.S2+'/',N.num)=N.num
WHERE n.num <= LEN(t.S2)
) AS Y
ON X.SomeId = Y.SomeId and X.r=Y.r;
PRINT DATEDIFF(MS,@st,GETDATE())
GO 3
PRINT CHAR(10)+'JSON - Yitzhak Khabinsky'+CHAR(10)+REPLICATE('-',90);
GO
DECLARE @st DATETIME = GETDATE(), @ID INT, @C1 VARCHAR(1000), @C2 VARCHAR(1000);
DECLARE @separator CHAR(1) = '/';
WITH rs AS
(
SELECT *
, ar1 = '["' + REPLACE(t.S1, @separator, '","') + '"]'
, ar2 = '["' + REPLACE(t.S2, @separator, '","') + '"]'
FROM #table AS t
)
SELECT @id = SomeID, @C1 = ColB.[value], @C2 = ColC.[value]
FROM rs AS rs
CROSS APPLY OPENJSON (ar1, N'$') AS ColB
CROSS APPLY OPENJSON (ar2, N'$') AS ColC
WHERE ColB.[key] = ColC.[key];
PRINT DATEDIFF(MS,@st,GETDATE())
GO 3
PRINT CHAR(10)+'CROSS APPLY TECHNIQUE'+CHAR(10)+REPLICATE('-',90);
GO
DECLARE @st DATETIME = GETDATE(), @ID INT, @C1 VARCHAR(1000), @C2 VARCHAR(1000);
SELECT
@ID = f.SomeId,
@C1 = f2.C1,
@C2 = f2.C2
FROM
(
SELECT
t.SomeId,
SUBSTRING(t.S1, 1, c1.P-1),
SUBSTRING(t.S1, c1.P+1, c2.P - c1.P-1),
SUBSTRING(t.S1, c2.P+1, 8000),
SUBSTRING(t.S2, 1, c1.P2 - 1),
SUBSTRING(t.S2, c1.P2+1, c2.P2 - c1.P2-1),
SUBSTRING(t.S2, c2.P2+1, 8000)
FROM #table AS t
CROSS APPLY (VALUES(CHARINDEX('/',t.S1),CHARINDEX('/',t.S2))) AS c1(P,P2)
CROSS APPLY (VALUES(CHARINDEX('/',t.S1,c1.P+1),CHARINDEX('/',t.S2,c1.P2+1))) AS c2(P,P2)
CROSS APPLY (VALUES(CHARINDEX('/',t.S1,c2.P+1),CHARINDEX('/',t.S2,c2.P2+1))) AS c3(P,P2)
) AS f(SomeId,c1_1,c1_2,c1_3,c2_1,c2_2,c2_3)
CROSS APPLY (VALUES (c1_1, c2_1), (c1_2, c2_2),(c1_3,c2_3)) AS f2(c1,c2);
PRINT DATEDIFF(MS,@st,GETDATE())
GO 3
100K row Test Results:
dbo.udf_SplitList -Andy3B
------------------------------------------------------------------------------------------
Beginning execution loop
24593
24566
24530
Batch execution completed 3 times.
dbo.fnString_Split - LukStorms
------------------------------------------------------------------------------------------
Beginning execution loop
8147
8260
8257
Batch execution completed 3 times.
Recursive CTE - TALLY TABLE DhruvJoshi
------------------------------------------------------------------------------------------
Beginning execution loop
6867
6733
6850
Batch execution completed 3 times.
TALLY TABLE without Recursion DhruvJoshi
------------------------------------------------------------------------------------------
Beginning execution loop
4677
4630
4620
Batch execution completed 3 times.
JSON - Yitzhak Khabinsky
------------------------------------------------------------------------------------------
Beginning execution loop
1667
1653
1670
Batch execution completed 3 times.
CROSS APPLY TECHNIQUE
------------------------------------------------------------------------------------------
Beginning execution loop
283
280
284
Batch execution completed 3 times.
1 million row Test Results:
dbo.udf_SplitList -Andy3B
------------------------------------------------------------------------------------------
Beginning execution loop
246057
245296
247017
Batch execution completed 3 times.
dbo.fnString_Split - LukStorms
------------------------------------------------------------------------------------------
Beginning execution loop
85340
83010
83674
Batch execution completed 3 times.
Recursive CTE - TALLY TABLE -DhruvJoshi
------------------------------------------------------------------------------------------
Beginning execution loop
67226
64910
64740
Batch execution completed 3 times.
TALLY TABLE without Recursion DhruvJoshi
------------------------------------------------------------------------------------------
Beginning execution loop
46777
44630
44623
Batch execution completed 3 times.
JSON - Yitzhak Khabinsky
------------------------------------------------------------------------------------------
Beginning execution loop
16710
16830
16520
Batch execution completed 3 times.
CROSS APPLY TECHNIQUE
------------------------------------------------------------------------------------------
Beginning execution loop
2846
2793
2850
Batch execution completed 3 times.
Completion time: 2022-01-18T22:08:50.3264912-06:00