2

As we know CDATA cannot be nested, so I like to use the solution provided in Using CDATA inside another CDATA that replace ]]> with ]]]]><![CDATA[>.

Therefore

<Root>
    <![CDATA[ 
        <AAA>
            <![CDATA[ 
                <BBB>hello world</BBB>
            ]]>
        </AAA>
    ]]>
</Root>

becomes

<Root>
    <![CDATA[ 
        <AAA>
            <![CDATA[ 
                <BBB>hello world</BBB>
            ]]]]><![CDATA[>
        </AAA>
    ]]>
</Root>

The XML is the response of my API, which will be used by other programs not under my control.

For .NET, my experiment shows that InnerText can output text in all CDATA sections.

var Root= doc.SelectNode("/Root");
var cdata = Root.InnerText;

cdata is

<AAA>
    <![CDATA[ 
        <BBB>hello world</BBB>
    ]]>
</AAA>

Does the behavior of .NET comply with any standards? Are there any standards saying how to deal with adjacent CDATA? If my API returns adjacent CDATA, will other programs or programming languages have issue processing it?

Community
  • 1
  • 1
Gqqnbig
  • 5,845
  • 10
  • 45
  • 86

1 Answers1

2

This behaviour is absolutely standard compliant and should produce the same result in any XML processor. CDATA sections can be used to escape any character data anywhere (except in another CDATA section) and you can use as many of them as you like, adjacent or not. From the specification:

Definition: CDATA sections may occur anywhere character data may occur; they are used to escape blocks of text containing characters which would otherwise be recognized as markup.

TToni
  • 9,145
  • 1
  • 28
  • 42
  • I cannot find definition of InnerText, probably specification uses another word I'm not aware of. What is InnerText of an element, is it the text in all CDATA sections? Is adjacent CDATA sections considered as one? – Gqqnbig Mar 04 '16 at 19:57
  • InnerText is the .NET equivalent of the character data (https://www.w3.org/TR/REC-xml/#dt-chardata) inside a given element. Each and any part of this character data can be escaped with CDATA. which :NET will then unescape when you call InnerText. In the maximum corner case you could put each character inside its own CDATA section. In the other (usual) corner case you don't use any CDATA escapes. If CDATA sections are adjacent or not is irrelevant. Text inside CDATA sections is not interpreted any further (which is the whole point of CDATA escapes, to protect text from XML interpretation) – TToni Mar 05 '16 at 09:39
  • Besides, https://dom.spec.whatwg.org/#interface-text says `text .wholeText` Returns the **combined data** of all direct Text node siblings. – Gqqnbig Jun 23 '16 at 18:29