Trying to understand the solution to the puzzle convert comma-separated values to a table using Cross Apply & XML

Question

I am not familiar with XML functions nor Outer Apply in SQL. Can someone dumb this solution down? I am having hard time understanding it.

Puzzle Link

SELECT Id, SplitedValue 
FROM
    (SELECT 
         ID, 
         CAST(('<X>'+replace(e.VALUE,',' ,'</X><X>')+'</X>') AS XML) AS xmlcol  
     FROM TestCommaUsingCrossApply e) s
OUTER APPLY
    (SELECT 
         ProjectData.D.value('.', 'varchar(100)') AS SplitedValue
     FROM 
         s.xmlcol.nodes('X') AS ProjectData(D)) a

This is a lengthy [discussion](https://stackoverflow.com/questions/23498284/why-is-cross-apply-needed-when-using-xpath-queries) that may help you. I don't understand this well enough to provide a specific explanation. — Isaac, Mar 31 '20 at 21:59
Whoever put that forward as a solution needs to be told it's broken. It doesn't deal with perfectly valid CSV like `"Hello, World!",I'm a separate field,too!` which is three fields: `Hello, World!`, `I'm a separate field` and `too!`. Comma-separated values is well documented in [RFC4180](https://tools.ietf.org/html/rfc4180) — AlwaysLearning, Mar 31 '20 at 22:12

Martin Smith · Answer 1 · 2020-03-31T22:55:14.507

As mentioned in a comment this is not a proper CSV parser. It is just a simple string splitter, but that is all that seems to be intended in the "puzzle" anyway.

Even for that though this method is quite inefficient and also only works reliably if the input is guaranteed to be fairly well sanitised and not contain characters of special significance in XML.

If you are on a modern version of SQL Server you should forget about this method and just use the following to achieve the same

SELECT ID,ss.value
FROM  TestCommaUsingCrossApply e
OUTER APPLY STRING_SPLIT(e.VALUE,',') ss

How the approach in the question works as a string splitter is quite simple.

+----------------------------------------------+--------------------------+
|                 expression                   |          value           |
+----------------------------------------------+--------------------------+
| e.VALUE                                      | 1,2,3                    |
| replace(e.VALUE,',' ,'</X><X>')              | 1</X><X>2</X><X>3        |
| '<X>'+replace(e.VALUE,',' ,'</X><X>')+'</X>' | <X>1</X><X>2</X><X>3</X> |
+----------------------------------------------+--------------------------+

Consider e.VALUE contains the string 1,2,3. Replacing the commas with </X><X> leads to the result in the second row above.

It adds a closing tag to close off the previous element and an opening tag before the next element. However the very first element is missing an open tag and the very last element missing a closing tag so these are rather inelegantly bolted on with concatenation to achieve the result in the third row.

That is a valid XML fragment so it can be cast to XML.

SELECT ProjectData.D.value('.', 'varchar(100)') as SplitedValue
FROM s.xmlcol.nodes('X') as ProjectData(D)

then creates a tabular result with one row for each <X> element and the . returns the text content of the element.

The OUTER APPLY allows this to be done for each row of the outer input.

Trying to understand the solution to the puzzle convert comma-separated values to a table using Cross Apply & XML

1 Answers1