Query to update strings using string_split function

Question

I am trying to update column in table where data is in below format:

Id | ColA
----------
1    Peter,John:Ryan,Jack:Evans,Chris
2    Peter,John:Ryan,Jack
3    Hank,Tom
4    
5    Cruise,Tom

I need to split the string by ':' and remove ',' and need to reverse the name and again append the same data separated by: and finally data should be as shown

Id | ColA
----------
1    John Peter:Jack Ryan:Chris Evans
2    John Peter:Jack Ryan
3    Tom Hank
4    
5    Tom Cruise

Please let me know how can we achieve this I tried to use Replace and Substring but how can we do it if we have data some are separated by two colon and some are separated by single colon. Is there any way to identify and achieve the data in the above formatted one.

While asking a question, you need to provide a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example): (1) DDL and sample data population, i.e. CREATE table(s) plus INSERT T-SQL statements. (2) What you need to do, i.e. logic and your code attempt implementation of it in T-SQL. (3) Desired output, based on the sample data in #1 above. (4) Your SQL Server version (SELECT @@version;). — Yitzhak Khabinsky, Jul 16 '21 at 01:49
Normalize your schema. See ["Is storing a delimited list in a database column really that bad?"](https://stackoverflow.com/questions/3653462/is-storing-a-delimited-list-in-a-database-column-really-that-bad) (Spoiler: Yes, it is.). — sticky bit, Jul 20 '21 at 01:08

trenton-ftw · Answer 1 · 2021-07-16T21:51:20.703

0

Something like this should work:

CREATE TABLE YourTableNameHere (
    Id int NULL 
    ,ColA varchar(1000) NULL 
);
INSERT INTO YourTableNameHere (Id,ColA) VALUES
(1, 'Peter,John:Ryan,Jack:Evans,Chris')
,(2, 'Peter,John:Ryan,Jack')
,(3, 'Hank,Tom')
,(4, '')
,(5, 'Cruise,Tom');

SELECT 
    tbl.Id 
    ,STUFF((SELECT 
                CONCAT(':'
                    ,RIGHT(REPLACE(ss.value, ',', ' '), LEN(REPLACE(ss.value, ',', ' ')) -  CHARINDEX(' ', REPLACE(ss.value, ',', ' '), 1)) /*first name*/ 
                    ,' '
                    ,CASE WHEN CHARINDEX(',', ss.value, 1) > 1 THEN  LEFT(REPLACE(ss.value, ',', ' '), CHARINDEX(' ', REPLACE(ss.value, ',', ' '), 1) - 1) /*last name*/ ELSE '' END)
            FROM 
                YourTableNameHere AS tbl_inner
                CROSS APPLY string_split(tbl_inner.ColA, ':') AS ss
            WHERE 
                tbl_inner.Id = tbl.Id
            FOR XML PATH('')), 1, 1, '') AS ColA 
FROM 
    YourTableNameHere AS tbl;

This uses the string_split function within a FOR XML clause to split the values in ColA by the : character, then replace the , with a space, parse to the left and right of the space, then recombine the parsed values delimited by a : character.

One thing to note here, per Microsoft the output of string_split is not guaranteed to be in the same order as the input:

Note
The order of the output may vary as the order is not guaranteed to match the order of the substrings in the input string.

So in order to guarantee the output of this function is going to concatenate the names back in the same order that they existed in the input column you would either need to implement your own function to split the string or come up with some criteria for combining them in a certain order. For example, you could recombine them in alphabetical order by adding ORDER BY ss.value to the inner query for ColA in the final result set. In my testing using your input the final values were ordered the same as the input column, but it is worth noting that that behaviour is not guaranteed and in order to guarantee it then you need to do more work.

edited Jul 16 '21 at 21:51

answered Jul 16 '21 at 00:25

trenton-ftw

950
5
14

Are you on SQL Server 2016 by any chance? That's the only one with `string_split` but not `string_agg` – Charlieface Jul 16 '21 at 00:46
@Charlieface I work on a lot of different instances :) but OP tagged `sql-server-2008-r2`. Assuming that he isn't actually on that version since that version wouldn't have `string_split` anyways (although he specifically requested its usage here). But at least the rest would be compatible. OP will likely need to implement custom function anyhow due to ordering issues with `string_split` I mentioned. That would make the whole thing 2008 R2 compatible. `string_agg` is a good call here tho. I can edit with `string_agg` if needed. – trenton-ftw Jul 16 '21 at 00:52
1

Also, the most correct `FOR XML` incantation goes like this `FOR XML PATH(''), TYPE).value('text()[1]','nvarchar(max)'), 1, LEN(yourseparator), '')` this prevents XML escaping and ensures the correct length gets chopped – Charlieface Jul 16 '21 at 00:59
@Charlieface can you point to a source for that assertion? I RARELY see people use that sort of incantation unless it is required due to the input data. Using the `TYPE` directive in this manner incurs additional cost and using that incantation relies on an implicit conversion. Something I would never recommend. – trenton-ftw Jul 16 '21 at 01:14
https://dba.stackexchange.com/questions/63445/avoiding-entitized-characters-when-using-for-xml-path-for-string-concatenation or here https://www.red-gate.com/simple-talk/sql/t-sql-programming/concatenating-row-values-in-transact-sql/#Toc205129484 if you prefer an official article, there are many others examples. In XML you need to escape common characters like `'` `"` `&` it's far better to just ensure it always gets done correctly. I don;t believe there is any implicit conversion involved, `TYPE` means it goes straight into `xml` datatype – Charlieface Jul 16 '21 at 01:23
As pointed out, both of those articles mention that you can use that method to avoid issues with characters being entitized. Neither one says something along the lines of 'don't use that use this'. Its a case by case basis thats dependent on your input data. IMO its still not worth the additional overhead unless there is a possibility those characters will exist in your data. Its not an equivalent incantation, it uses different functionality for different use cases. Also I get two implicit conversions using your provided `FOR XML` incantation. I can paste the plan if you would like to see. – trenton-ftw Jul 16 '21 at 01:43
'TYPE means it goes straight into xml datatype', yes but it has to come back to varchar(max) for this query to work. I believe the implicit conversion comes from the `value` operator within the XML reader that gets instantiated to retrieve the value as varchar(max) from the returned `xml`. Obviously not a huge issue as far actual performance goes, but I still hate those being anywhere in a plan unless it is required for the functionality. – trenton-ftw Jul 16 '21 at 01:49
I tried with above code,but it's only working for the first part in the string' like John Peter:Ryan Jack'.second part after ':' text is not swapping – AMDI Jul 16 '21 at 20:50
@AMDI I am confused? I have edited my answer to add sample data to the script in the table `YourTableNameHere`. The output is exactly as you have requested. Can you clarify what exactly this is missing? – trenton-ftw Jul 16 '21 at 21:52
@trenton-ftw- I am expecting the output as John Peter:Jack Ryan :Chris: Evan.Unfortunately,I cant use string_split as its not supported – AMDI Jul 19 '21 at 23:47

Yitzhak Khabinsky · Answer 2 · 2021-07-16T03:10:13.410

Here is a solution for SQL Server 2008 onwards.

It is based on XML and XQuery.

Using XQuery's FLWOR expression allows to tokenize odd vs. even XML elements. The rest is just a couple of the REPLACE() function calls to compose the desired output.

SQL

-- DDL and sample data population, start
DECLARE @tbl TABLE (ID INT IDENTITY PRIMARY KEY, tokens VARCHAR(1024));
INSERT INTO @tbl (tokens) VALUES
('Peter,John:Ryan,Jack:Evans,Chris'),
('Peter,John:Ryan,Jack'),
('Hank,Tom'),
(''),
('Cruise,Tom');
-- DDL and sample data population, end

DECLARE @separator CHAR(1) = ':'
    , @comma CHAR(1) = ',';

SELECT ID, tokens
   , REPLACE(REPLACE(c.query('
       for $x in /root/r[position() mod 2 eq 0]
       let $pos := count(root/r[. << $x]) 
       return concat($x, sql:variable("@comma"), (/root/r[$pos])[1])
       ').value('text()[1]', 'VARCHAR(8000)')
       , SPACE(1), @separator), @comma, SPACE(1)) AS result
FROM @tbl
    CROSS APPLY (SELECT CAST('<root><r><![CDATA[' + 
            REPLACE(REPLACE(tokens,@comma,@separator), @separator, ']]></r><r><![CDATA[') + 
            ']]></r></root>' AS XML)) AS t1(c)
ORDER BY ID;

Output

+----+----------------------------------+----------------------------------+
| ID |              tokens              |              result              |
+----+----------------------------------+----------------------------------+
|  1 | Peter,John:Ryan,Jack:Evans,Chris | John Peter:Jack Ryan:Chris Evans |
|  2 | Peter,John:Ryan,Jack             | John Peter:Jack Ryan             |
|  3 | Hank,Tom                         | Tom Hank                         |
|  4 |                                  | NULL                             |
|  5 | Cruise,Tom                       | Tom Cruise                       |
+----+----------------------------------+----------------------------------+

SQL #2 (don't try it, it won't work)

Unfortunately, SQL Server doesn't fully support even XQuery 1.0 standard. XQuery 3.1 is the latest standard. XQuery 1.0 functions fn:substring-after() and fn:substring-before() are badly missing.

In a dream world a solution would be much simpler, along the following:

SELECT *
    , c.query('
    for $x in /root/r
    return concat(fn:substring-after($x, ","), ",", fn:substring-before($x, ","))
    ') 
FROM @tbl
    CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA[' + 
            REPLACE(tokens, @separator, ']]></r><r><![CDATA[') + 
            ']]></r></root>' AS XML)) AS t1(c);

Please up-vote the following suggestion to improve SQL Server: SQL Server vNext (post 2019) and NoSQL functionality

It became one of the most popular requests for SQL Server. The current voting tally is 590 and counting.

Thanks for suggestion.Some how I am getting error like XML Parsing:line 1, 'Incorrect document syntax' — AMDI, Jul 16 '21 at 20:46
@AMDI, Please edit your original question and add there "DDL and sample data population" section exactly like in my answer. — Yitzhak Khabinsky, Jul 16 '21 at 21:03
I was able to run the provided script (straight copy and paste) from this answer and the output provided in this answer matches the output of the script provided. — trenton-ftw, Jul 16 '21 at 21:54
@trenton-ftw, It seems that the OP has a different data set... Not what was provided in the question. — Yitzhak Khabinsky, Jul 16 '21 at 21:58
@YitzhakKhabinsky I think you are right. In your solution you could replace `CAST` with `TRY_CAST`. OP should be able to figure out what rows have the offending data with a filter like `WHERE tokens IS NOT NULL AND tokens <> '' AND c IS NULL`. I'm sure you are aware of this but might be useful for OP to troubleshoot or at least find the tokens that are problematic with this solution. — trenton-ftw, Jul 16 '21 at 22:54
@trenton-ftw, The `TRY_CAST()` function is available on SQL SERVER 2012 onwards. I tried to stick to SQL Server 2008 as labeled in the question. By the way, please up-vote my suggestion. Its link is at the bottom of my answer. — Yitzhak Khabinsky, Jul 16 '21 at 23:13

Query to update strings using string_split function

2 Answers2