0

I would expect the cdata tags to wrap the text value of col2 element but it is not doing so. I was using xml raw but because i got hit with some invalid xml characters in the data i now have to wrap everything with the cdata tags so I guess the only way to do this is by using explicit mode. Without the insert into the table the cdata tags are shown but the insert into the table removes them, I suspect it has something to do with the automatic xml data type conversion. Also the order by is not sorting by the numeric data type, it is doing a string sort


-- drop table Table1
-- truncate table Table1
create table dbo.Table1
(
  RowId int not null identity,
  BatchId int not null,    
  RequirementId int not null,   
  DataSheetName nvarchar(max) null,
  TestCaseName  nvarchar(max) null,
  GridStart nvarchar(max) null,
  GridId int not null,
  Col1  nvarchar(max) null,
  Col2  nvarchar(max) null
) 

create table results (
    GridName nvarchar(max),
    GridXML xml,
)


insert into Table1(BatchId, RequirementId, DataSheetName,TestCaseName,GridName,GridStart,GridId, Col1,Col2)
values(1,1,'DataSheet','TestCase1','Grid1','Grid', 1,'','Deal Number')
insert into Table1(BatchId, RequirementId, DataSheetName,TestCaseName,GridName,GridStart,GridId, Col1,Col2)
values(1,1,'DataSheet','TestCase1','Grid1','1', 1,'r1','r1c1')
insert into Table1(BatchId, RequirementId, DataSheetName,TestCaseName,GridName,GridStart,GridId, Col1,Col2)
values(1,1,'DataSheet','TestCase1','Grid1','2', 1,'r2','r2c1')
insert into Table1(BatchId, RequirementId, DataSheetName,TestCaseName,GridName,GridStart,GridId, Col1,Col2)
values(1,1,'DataSheet','TestCase1','Grid1','3', 1,'r3','r3c1')
insert into Table1(BatchId, RequirementId, DataSheetName,TestCaseName,GridName,GridStart,GridId, Col1,Col2)
values(1,1,'DataSheet','TestCase1','Grid1','4', 1,'r4','r4c1')
insert into Table1(BatchId, RequirementId, DataSheetName,TestCaseName,GridName,GridStart,GridId, Col1,Col2)
values(1,1,'DataSheet','TestCase1','Grid1','5', 1,'r5','r5c1')
insert into Table1(BatchId, RequirementId, DataSheetName,TestCaseName,GridName,GridStart,GridId, Col1,Col2)
values(1,1,'DataSheet','TestCase1','Grid1','6', 1,'r6','r6c1')
insert into Table1(BatchId, RequirementId, DataSheetName,TestCaseName,GridName,GridStart,GridId, Col1,Col2)
values(1,1,'DataSheet','TestCase1','Grid1','7', 1,'r7','r7c1')
insert into Table1(BatchId, RequirementId, DataSheetName,TestCaseName,GridName,GridStart,GridId, Col1,Col2)
values(1,1,'DataSheet','TestCase1','Grid1','8', 1,'r8','r8c1')
insert into Table1(BatchId, RequirementId, DataSheetName,TestCaseName,GridName,GridStart,GridId, Col1,Col2)
values(1,1,'DataSheet','TestCase1','Grid1','9', 1,'r9','r9c1')
insert into Table1(BatchId, RequirementId, DataSheetName,TestCaseName,GridName,GridStart,GridId, Col1,Col2)
values(1,1,'DataSheet','TestCase1','Grid1','10', 1,'r10','r10c1')
insert into Table1(BatchId, RequirementId, DataSheetName,TestCaseName,GridName,GridStart,GridId, Col1,Col2)
values(1,1,'DataSheet','TestCase1','Grid1','END', 1,'','')


   insert into results
   select Z.GridName,  
      (select * from 
         ( select 1 [Tag], null [Parent], null [DocumentElement!1!], 
            null [tr!2!], null [tr!2!RowId!Element], null [tr!2!Col2!CDATA]  

            union  all  

            select 2 as Tag, 1 as Parent, 'DocumentElement',GridStart,GridStart, Col2  
            from Table1 S  
            where GridStart != 'GRID' and GridStart != 'END'  
            and Z.DataSheetName = S.DataSheetName  
            and Z.TestCaseName = S.TestCaseName  
            and Z.GridName = S.GridName  
            and Z.GridId = S.GridId  
            and Z.BatchId = S.BatchId 
         ) 
         as GridData 
         order by [tr!2!RowId!Element] 
         for xml explicit 
      ) as G 
    from dbo.Table1 Z 

select  *  from results

Generated output:

<DocumentElement>
  <tr>1<RowId>1</RowId><Col2>r1c1</Col2></tr>
  <tr>10<RowId>10</RowId><Col2>r10c1</Col2></tr>
  <tr>2<RowId>2</RowId><Col2>r2c1</Col2></tr>
  <tr>3<RowId>3</RowId><Col2>r3c1</Col2></tr>
  <tr>4<RowId>4</RowId><Col2>r4c1</Col2></tr>
  <tr>5<RowId>5</RowId><Col2>r5c1</Col2></tr>
  <tr>6<RowId>6</RowId><Col2>r6c1</Col2></tr>
  <tr>7<RowId>7</RowId><Col2>r7c1</Col2></tr>
  <tr>8<RowId>8</RowId><Col2>r8c1</Col2></tr>
  <tr>9<RowId>9</RowId><Col2>r9c1</Col2></tr>
</DocumentElement>


<DocumentElement>
  <tr>1<RowId>1</RowId><Col2><![CDATA[r1c1]]</Col2></tr>
  <tr>2<RowId>2</RowId>Col2><![CDATA[r2c1]]</Col2></tr>
  <tr>3<RowId>3</RowId>Col2><![CDATA[r3c1]]</Col2></tr>
  <tr>4<RowId>4</RowId>Col2><![CDATA[r4c1]]</Col2></tr>
  <tr>5<RowId>5</RowId>Col2><![CDATA[r5c1]]</Col2></tr>
  <tr>6<RowId>6</RowId>Col2><![CDATA[r6c1]]</Col2></tr>
  <tr>7<RowId>7</RowId>Col2><![CDATA[r7c1]]</Col2></tr>
  <tr>8<RowId>8</RowId>Col2><![CDATA[r8c1]]</Col2></tr>
  <tr>9<RowId>9</RowId>Col2><![CDATA[r9c1]]</Col2></tr>
  <tr>10<RowId>10</RowId>Col2><![CDATA[r10c1]]</Col2></tr>
</DocumentElement>
medic
  • 37
  • 8
  • While asking a question, you need to provide a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example): (1) DDL and sample data population, i.e. CREATE table(s) plus INSERT T-SQL statements. (2) What you need to do, i.e. logic and your code attempt implementation of it in T-SQL. (3) Desired output, based on the sample data in the #1 above. (4) Your SQL Server version (SELECT @@version;). – Yitzhak Khabinsky Mar 09 '22 at 14:58
  • @YitzhakKhabinsky, ok will do. should i remove the question and repost with the sample data? – medic Mar 09 '22 at 15:00
  • No, use the [edit] feature, @medic . – Thom A Mar 09 '22 at 15:01
  • I think you need !! not ! ? – Mike Miller Mar 09 '22 at 15:08
  • I'd love to know why this needs `EXPLICIT` anyway, it seems there is no nesting involved – Charlieface Mar 09 '22 at 15:29
  • `CDATA` is just an encoding thing, the values are exactly the same. See eg https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=ac3d5377dfa87e0a8a2903f9a41f259c Or in other words: what process do you have that differentiates between them? I must say: judging from testing, it appears it is possible to generate it using `FOR XML EXPLICIT` without `, TYPE` not sure why you cannot but you haven't shown sample data so can't test. Possibly you are inserting into a `xml` column? – Charlieface Mar 09 '22 at 15:37
  • I added some sample code. Initially I was using xml raw but some of the data contains values which causes the xml engine to throw an error on the insert, so i figured i could wrap the values with the cdata tag to get around the error. – medic Mar 09 '22 at 16:41
  • Please add sample data, preferably as `CREATE TABLE` and `INSERT` statements – Charlieface Mar 09 '22 at 16:59
  • @Charlieface I did – medic Mar 09 '22 at 17:11
  • @YitzhakKhabinsky I added some sample code – medic Mar 09 '22 at 17:13
  • First of all, SQL Server will itself not generate invalid XML if you use `FOR XML PATH`, I don't think `EXPLICIT` is needed here. Second, you *cannot* preserve `CDATA` when inserting typed XML; SQL Server considers this semantically irrelevant and will instead convert the contents by escaping as necessary and stripping the `CDATA` wrappers. This will never turn valid XML into invalid XML. Your test data doesn't appear to contain any "invalid characters"; I suggest adding some to highlight the actual problem you're having (if any). – Jeroen Mostert Mar 09 '22 at 17:22
  • @JeroenMostert I think I'm not making myself clear enough. I'm not suggesting valid xml will change to be invalid. What i mean is that something in the data is corrupting the the conversion to xml. I tried adding it the bad data to the example but it isnt represented properly so it can be reproduced here. But let's say that the value above (r9c1) has some unprintable characters, basically the data could be garbage, so when the insert into gridxml column in the results table above it throws the following error - XML parsing: line 1, character 142, illegal xml character – medic Mar 09 '22 at 19:27
  • If the character is truly unprintable (as in, a control character) you cannot process it with SQL Server's XML functionality, not even if escaped. `CDATA` will simply not help you there, as that is merely an alternate representation of the character. To get that data to insert, you have to actively remove control characters not allowed by XML, or else process it as an `NVARCHAR(MAX)` (but any time you have to process it as XML it will just be problematic again). [Demo](https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=4081ebdea4a91326f1eb0dfd6d9652b9). – Jeroen Mostert Mar 09 '22 at 19:48
  • @JeroenMostert are there different version of xml that supported the extended characters and if so is there a way to specify that xml ver when that column in the table is an xml datatype https://stackoverflow.com/a/406836/1713000 – medic Mar 09 '22 at 20:32
  • No. SQL Server supports XML 1.0 only. An attempt to select a version other than 1.0 through an XML declaration will just give an "incorrect xml declaration syntax" error. – Jeroen Mostert Mar 09 '22 at 20:35
  • Nothing is corrupting it. The `xml` data type is not an exact textual representation of what you insert. It does not preserve insignificant whitespace for example. The XML is normalized, and that means that CDATA blocks are parsed into whatever they are supposed to be. – Charlieface Mar 09 '22 at 20:36
  • Let's start from the other end: why do you **need** `CDATA`? What parser are you using that requires it? No decent parser should have any problem with either form. Relevant https://stackoverflow.com/a/967721/14868997 – Charlieface Mar 09 '22 at 20:40
  • @Charlieface I think you look through the other comments, JeroenMostert identified what my problem is and I think i'm S out of luck – medic Mar 09 '22 at 20:58
  • I still don't get what you are ultimately trying to achieve, if you elaborate we may be able to help. Why do you need `CDATA` and what is wrong with the result that SQL Server gives you, as they are semantically equivalent – Charlieface Mar 09 '22 at 21:00
  • @Charlieface: CDATA is a complete red herring; the poster thought this would allow them to include control characters in their XML somehow, but it doesn't -- nothing does. Cutting out the control characters (or, I suppose, consistently using a binary encoding for the content) is the only way. It's a classic XY problem. – Jeroen Mostert Mar 09 '22 at 21:06
  • If you have unprintable characters perhaps you should be storing them as binary (which gets encoded as base64) – Charlieface Mar 09 '22 at 21:09
  • @Charlieface yeah I would have to change my model in that case but it what it is, thx for the help everyone – medic Mar 09 '22 at 21:16

0 Answers0