Convert CDATA node to encoded string in .Net

Question

TL;DR - in .Net and XmlDocument/XDocument is there an easy way (XPath?) to find CDATA nodes, so they can be removed and the contents encoded?

Details...

My system has lots of situations where it builds XML strings manually (e.g. string concatination, rather than building via XmlDocument or XDocument) which could contain multiple <![CDATA[...]]> nodes (which could appear at any level of the structure)... e.g.

<data><one><![CDATA[ab&cd]]></one><two><inner><![CDATA[xy<z]]></inner></two></data>

When storing this data in a SQLServer XML column, the <![CDATA[..]]> is automatically removed and the inner text encoded... this is standard for SQLServer which doesn't "do" CDATA.

My issue is that I have complex code that takes two instances of a class, and audit-trails differences between them... one or more could be a string property containing XML.

This results in a mismatch (and therefore an audit-trail entry) when nothing is actually changing, because the code creates one format of XML and SQLServer returns a different form, e.g...

// Manually generated XML string...
<data><one><![CDATA[ab&cd]]></one><two><inner><![CDATA[xy<z]]></inner></two></data>
// SQLServer returned string...
<data><one>ab&amp;cd</one><two><inner>xy&lt;z</inner></two></data>

Is there an easy way in .Net to process the manually generated XML and convert each CDATA node into it's encoded version, so I can compare the string to the one returned by SQLServer?

Is there a SelectNodes XPath that would find all those elements?

(And before anybody states it, the obvious solution is to not use CDATA in the manual creation of the XML in the first place... however, this is not possible due to the sheer number of instances.)

_this is not possible due to the sheer number of instances_: This statement seems to be a fallacy. It's most likely possible, just not desirable. — Tu deschizi eu inchid, Jan 27 '23 at 18:54
Couldn't you process the XML strings before converting to `XmlDocument`/`XDocument` with `Regex.Replace` method version? — NetMage, Jan 27 '23 at 20:12
@NetMage - Regex definitely has it's place, and I do use it regularly... but trying to parse XML with it is a dangerous game. See the [notorious answer](https://stackoverflow.com/a/1732454/930393) to somebody asking if they can parse XHTML with regex — freefaller, Jan 30 '23 at 09:03
@freefaller Ah, but you are not parsing, you are trying to convert a single element. I supposed there are potential issues with comments and strings if they happened to contain a CDATA pattern, but that seems pretty unlikely... — NetMage, Jan 30 '23 at 17:36

score 2 · Accepted Answer · answered Jan 28 '23 at 09:16

Easy with one foreach loop and ReplaceChild:

using System.Xml;

var doc = new XmlDocument();
doc.LoadXml(@"<data><one><![CDATA[ab&cd]]></one><two><inner><![CDATA[xy<z]]></inner></two><three><inner>a &lt; b</inner></three></data>");

foreach (var cdata in doc.SelectNodes("//text()").OfType<XmlCDataSection>())
{
   cdata.ParentNode.ReplaceChild(doc.CreateTextNode(cdata.Data), cdata);
}

Console.WriteLine(doc.OuterXml);

Outputs

<data><one>ab&amp;cd</one><two><inner>xy&lt;z</inner></two><three><inner>a &lt; b</inner></three></data>

Another option would be to run the XML through an XSLT identity transformation with XslCompiledTransform and e.g.

<xsl:stylesheet
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="1.0">

  <xsl:template match="@* | node()">
    <xsl:copy>
      <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

Excellent - thank you @Martin. Didn't occur to me to test the type of the nodes... I was concentrating on trying to find an appropriate xpath. The first part of your answer is perfect — freefaller, Jan 30 '23 at 08:59

Convert CDATA node to encoded string in .Net

1 Answers1