Using STRING_SPLIT in SQL for delimited values

Question

@InStr = '0|ABC|3033.9|3032.4444|0|0|0^1|DEF|3033.2577|3033.053|3032.0808|0|0^2|JHI|3032.8376|3033.2596|3033.2259|3033.322|0^3|XYZ|3032.8376|3032.8376|3032.8376|3032.8376|0'

I have the string above in a variable @InStr and I want to use STRING_SPLIT to inserts values into a table.

As you can see its a double split.

SELECT Value FROM STRING_SPLIT(@InStr,'^')

Produces:

0|ABC|3033.9|3032.4444|0|0|0
1|DEF|3033.2577|3033.053|3032.0808|0|0
2|JHI|3032.8376|3033.2596|3033.2259|3033.322|0
3|XYZ|3032.8376|3032.8376|3032.8376|3032.8376|0

Which is good, now I need to take each row and insert into a table.

I'm not sure how to combine the 2 splits to do the insert. The table has 7 columns which it would populate.

Any help appreciated.

Aaron Bertrand · Answer 1 · 2020-06-10T21:05:32.870

Instead of passing a string from .NET like 'a|b|c^d|e|f' and then having to parse it, leave it in its original structure (DataTable?) and create a table type in SQL Server. Then you can pass in your structure instead of this cobbled-together string.

In SQL Server:

CREATE TYPE dbo.MyTableType AS TABLE
(
  ColumnA int,
  ColumnB nvarchar(32), 
  ...
);
GO

CREATE PROCEDURE dbo.ShowArray
  @DataTable dbo.MyTableType
AS
BEGIN
  SET NOCOUNT ON;

  SELECT ColumnA, ColumnB, ...
    FROM @DataTable;
END

In C# (untested and incomplete):

DataTable dt = new DataTable();
dt.Columns.Add("ColumnA", typeof(Int32));
dt.Columns.Add("ColumnB", typeof(String));
...

DataRow dr = dt.NewRow();  
dr[0] = 1;  
dr[1] = "foo";
...

dt.Rows.Add(dr);
...

  SqlCommand cmd = new SqlCommand("dbo.ShowArray", connectionObject);
  cmd.CommandType = CommandType.StoredProcedure;
  SqlParameter tvp1 = c2.Parameters.AddWithValue("@DataTable", dt);
  tvp1.SqlDbType = SqlDbType.Structured;
  ...

More on this shift away from splitting strings here and, actually, in this answer as well:

https://stackoverflow.com/a/11105413/61305

score 2 · Answer 2 · answered Jun 10 '20 at 07:01

First of all: You should avoid STRING_SPLIT() in almost any case. It does not guarantee to return the items in the expected sort order. This might work in all your tests and break in production with silly hardly to find errors.

There are various answers already, the best one should be the table type parameter. But (if you cannot follow this route), I'd like to suggest two type-safe approaches:

DECLARE @InStr NVARCHAR(MAX) = '0|ABC|3033.9|3032.4444|0|0|0^1|DEF|3033.2577|3033.053|3032.0808|0|0^2|JHI|3032.8376|3033.2596|3033.2259|3033.322|0^3|XYZ|3032.8376|3032.8376|3032.8376|3032.8376|0';

--xml approach (working for almost any version)
--We do the double split in one single action and return a nested XML with <x> and <y> elements
--We can fetch the values type-safe from their 1-based position:

SELECT x.value('y[1]','int')           AS [First]
      ,x.value('y[2]','varchar(100)')  AS [Second]
      ,x.value('y[3]','decimal(28,8)') AS Third
      ,x.value('y[4]','decimal(28,8)') AS Fourth
      ,x.value('y[5]','decimal(28,8)') AS Fifth
      ,x.value('y[6]','decimal(28,8)') AS Sixth
      ,x.value('y[7]','decimal(28,8)') AS Seventh
FROM (VALUES(CAST('<x><y>' + REPLACE(REPLACE(@Instr,'|','</y><y>'),'^','</y></x><x><y>') + '</y></x>' AS XML)))v(Casted)
CROSS APPLY Casted.nodes('/x') b(x);

--json approach (needs v2016+)
--faster than XML
--We transform your string to a JSON-array with one item per row and use another OPENJSON to retrieve the array's items.
--The WITH-clause brings in implicit pivoting to retrieve the items type-safe as columns:

SELECT b.*
FROM OPENJSON(CONCAT('[["',REPLACE(@Instr,'^','"],["'),'"]]')) a
CROSS APPLY OPENJSON(CONCAT('[',REPLACE(a.[value],'|','","'),']'))
WITH([First]   INT           '$[0]'
    ,[Second]  VARCHAR(100)  '$[1]'
    ,[Third]   DECIMAL(28,8) '$[2]'
    ,[Fourth]  DECIMAL(28,8) '$[3]'
    ,[Fifth]   DECIMAL(28,8) '$[4]'
    ,[Sixth]   DECIMAL(28,8) '$[5]'
    ,[Seventh] DECIMAL(28,8) '$[6]') b;

Both approaches return the same result:

+-------+--------+---------------+---------------+---------------+---------------+------------+
| First | Second | Third         | Fourth        | Fifth         | Sixth         | Seventh    |
+-------+--------+---------------+---------------+---------------+---------------+------------+
| 0     | ABC    | 3033.90000000 | 3032.44440000 | 0.00000000    | 0.00000000    | 0.00000000 |
+-------+--------+---------------+---------------+---------------+---------------+------------+
| 1     | DEF    | 3033.25770000 | 3033.05300000 | 3032.08080000 | 0.00000000    | 0.00000000 |
+-------+--------+---------------+---------------+---------------+---------------+------------+
| 2     | JHI    | 3032.83760000 | 3033.25960000 | 3033.22590000 | 3033.32200000 | 0.00000000 |
+-------+--------+---------------+---------------+---------------+---------------+------------+
| 3     | XYZ    | 3032.83760000 | 3032.83760000 | 3032.83760000 | 3032.83760000 | 0.00000000 |
+-------+--------+---------------+---------------+---------------+---------------+------------+

Nice, I still tend to forget that OPENJSON can retain order, but I think it's the replace and concat gymnastics that make me forget. — Aaron Bertrand, Jun 10 '20 at 12:43

score 1 · Answer 3 · answered Jun 10 '20 at 01:47

1

You can use a recursive CTE:

declare @instr varchar(max) = '0|ABC|3033.9|3032.4444|0|0|0^1|DEF|3033.2577|3033.053|3032.0808|0|0^2|JHI|3032.8376|3033.2596|3033.2259|3033.322|0^3|XYZ|3032.8376|3032.8376|3032.8376|3032.8376|0'
;
with cte as (
      select row_number() over (order by (select null)) as id, convert(varchar(max), null) as el, Value + '|' as rest, 0 as lev
      from string_split(@InStr, '^')
      union all
      select id, left(rest, charindex('|', rest) - 1),
             stuff(rest, 1, charindex('|', rest), ''),
             lev + 1
      from cte
      where rest <> ''
     )
select max(case when lev = 1 then el end),
       max(case when lev = 2 then el end),
       max(case when lev = 3 then el end),
       max(case when lev = 4 then el end),
       max(case when lev = 5 then el end),
       max(case when lev = 6 then el end),
       max(case when lev = 7 then el end)
from cte
group by id;

Here is a db<>fiddle.

Unfortunately, you can't safely use string_split() because it does not provide the offset for the values returned.

answered Jun 10 '20 at 01:47

Gordon Linoff

1,242,037
58
646
786

Dale, I guess there isn't an elegant way to pass a 2 dimensional array through a stored procedure and store it? – pju Jun 10 '20 at 02:18
1

@pju Where is your "2 dimensional array" coming from? Why not use a [table-valued parameter](https://sqlperformance.com/2012/08/t-sql-queries/splitting-strings-now-with-less-t-sql) and pass in a data structure (like a DataTable) from your application? Then no parsing or splitting or array simulation... it's already a table. – Aaron Bertrand Jun 10 '20 at 02:20
Coming from .NET (c# or vb) – pju Jun 10 '20 at 02:26
@pju "Coming from .Net" should have been in the question. Since you're using .Net, Aaron Bertrand's answer is the best one here. – Zohar Peled Jun 10 '20 at 05:56
@pju . . . This question is about string parsing (which this question answers). If what you really want to do is pass a table to a stored procedure -- well, you didn't ask that question. But Aaron's comment should point you in the right direction. – Gordon Linoff Jun 10 '20 at 11:08

score 0 · Answer 4 · answered Jun 10 '20 at 04:30

For a subsequent splitting of pipe-separated substrings you can utilise openjson(), as demonstrated in the example below:

declare @InStr varchar(max) = '0|ABC|3033.9|3032.4444|0|0|0^1|DEF|3033.2577|3033.053|3032.0808|0|0^2|JHI|3032.8376|3033.2596|3033.2259|3033.322|0^3|XYZ|3032.8376|3032.8376|3032.8376|3032.8376|0';

select p.*
from (
    select ss.value as [RowId], oj.[key] as [ColumnId], oj.value as [ColumnValue]
    from string_split(@InStr,'^') ss
        cross apply openjson('["' + replace(ss.value, '|', '","') + '"]', '$') oj
) q
pivot (
    min(q.ColumnValue)
    for q.[ColumnId] in ([0], [1], [2], [3], [4], [5], [6])
) p;

There are many caveats with this approach, however. The most prominent are:

You need SQL Server 2016 or later, and the database compatibility level needs to be 130 or above;
If your data is of any size worth mentioning (1Mb+), this code might work unacceptably slow. String manipulation is not the strong point of SQL Server.

Personally, I would recommend parsing this string outside of SQL. If it's a flat file you are importing, SSIS dataflow will be much easier to develop and faster to work. If it's an application, then redesign it to pass either a suitable table type, or XML / JSON blob at the very least.

Pointing to `OPENJSON()` as replacement for `STRING_SPLIT()` is a good thing, but there are two enhancements: 1) I'd use `CONCAT()` rather than the `+` operator to chain the strings, and 2) You can use **doubled brackets** (`[[`), which allows for a position and type-safe `WITH`-clause (no `PIVOT` needed). I've placed an answer here in this thread and on more detailed answer [here](https://stackoverflow.com/a/38274439/5089204) — Shnugo, Jun 10 '20 at 07:09
@Shnugo, I don't use `concat()` anymore, as I've witnessed a huge performance drop caused by this function once, compared to the standard string concatenation. — Roger Wolf, Jun 10 '20 at 08:12

Venkataraman R · Answer 5 · 2020-06-10T07:38:12.900

0

I am generating INSERT statement and then executing it. First I am splitting the string and then I am generating INSERT statement.

Note:

I am assuming that second column will be three letter code.
I am assuming that sort order of rows doesn't matter

declare @instr varchar(max) = '0|ABC|3033.9|3032.4444|0|0|0^1|DEF|3033.2577|3033.053|3032.0808|0|0^2|JHI|3032.8376|3033.2596|3033.2259|3033.322|0^3|XYZ|3032.8376|3032.8376|3032.8376|3032.8376|0'
;
declare @insertStmt VARCHAR(max) ='INSERT INTO TABLEName VALUES '+ CHAR(13) + CHAR(10);
SELECT @insertStmt +=  CONCAT('(',replace(stuff(stuff(value,3,0,''''),7,0,''''),'|',','),'),') 
from STRING_SPLIT(@instr,'^') 
SELECT @insertStmt = STUFF(@insertStmt,len(@insertStmt),1,'')
select @insertStmt
EXEC(@insertStmt)

INSERT INTO TABLEName VALUES 
(0,'ABC',3033.9,3032.4444,0,0,0),(1,'DEF',3033.2577,3033.053,3032.0808,0,0),(2,'JHI',3032.8376,3033.2596,3033.2259,3033.322,0),(3,'XYZ',3032.8376,3032.8376,3032.8376,3032.8376,0)

edited Jun 10 '20 at 07:38

answered Jun 10 '20 at 04:44

Venkataraman R

12,181
2
31
58

1

Very dangerous! This might work in your system and pass all tests and still break in production. `STRING_SPLIT()` does not guarantee to return the items in the expected sort order... Furthermore, your *stuffing* into fix positions is quirky... – Shnugo Jun 10 '20 at 07:05
... addition: Instead of `STRING_SPLIT()` we should use `OPENJSON()` (I've placed an answer here too and a more detailed one [here](https://stackoverflow.com/a/38274439/5089204)). – Shnugo Jun 10 '20 at 07:11
@Shnugo, I am not splitting columns in STRING_SPLIT. I am splitting the rows. So, sort order is not a problem here. Another thing is, as I have mentioned, I am assuming that second column is 3 character code, from the test data, OP has put. If it is > 3 characters, we have to use PATINDEX or some other approach. – Venkataraman R Jun 10 '20 at 07:21
2

We do not know, if 1) the rows sort order does not matter and 2) if this code is of 3 characters in any case. In general I'd never rely on this. 2 years later someone decides to use 4 characters and suddenly your application needs to be reworked in thousand places... So your last sentence brings it to the point: *some other approach* :-) – Shnugo Jun 10 '20 at 07:33
@Shnugo, I completely agree with you. I have added note about my assumptions. – Venkataraman R Jun 10 '20 at 07:38

Using STRING_SPLIT in SQL for delimited values

5 Answers5