35

I keep getting "XML parser failure: Unterminated attribute" with my parser when I attempt to put HTML text or CDATA inside my XML attribute. Is there a way to do this or is this not allowed by the standard?

Boon
  • 40,656
  • 60
  • 209
  • 315
  • 2
    Can you add a source code sample to show us what your structure looks like? – Jordan Parmer Aug 17 '09 at 18:14
  • 1
  • @Pradyumna you are confusing yourself trying to use a CDATA Section, whereas your attribute could simple be of CDATA attribute type, see [my answer](http://stackoverflow.com/a/29780972/611007). – n611x007 Apr 21 '15 at 19:32
  • @naxa I know, I was just responding to Jordan Parmer's comment above that asked for a sample from the OP. I guess the OP was attempting to do what I wrote in my example. (it's not an answer) BTW, this question is over 5 years old :) – Pradyumna Apr 22 '15 at 05:02
  • related: https://stackoverflow.com/questions/260436/ - related: https://stackoverflow.com/questions/449627/ _ related: https://stackoverflow.com/questions/2004386/ – n611x007 Apr 22 '15 at 11:00

6 Answers6

30

No, The markup denoting a CDATA Section is not permitted as the value of an attribute.

According to the specification, this prohibition is indirect rather than direct. The spec says that the Attribute value must not have an open angle bracket. Open angle brackets and ampersand must be escaped. Therefore you cannot insert a CDATA section. womp womp.

A CData Section is interpreted only when it is in a text node of an element.

Cheeso
  • 189,189
  • 101
  • 473
  • 713
JMP
  • 7,734
  • 6
  • 33
  • 34
  • 2
    While the accepted is merely citing documentation, you give the right answer without any source or backing information. Who defines that CDATA can only be used inside elements rather than attribute values? – dakab Mar 10 '17 at 07:36
  • I updated the answer to cite the section in the XML spec that says attr values cannot contain open angle bracket. – Cheeso Jan 07 '22 at 22:42
16

Attributes can only have plain text inside, no tags, comments, or other structured data. You need to escape any special characters by using character entities. For example:

<code text="&lt;a href=&quot;/&quot;&gt;">

That would give the text attribute the value <a href="/">. Note that this is just plain text so if you wanted to treat it as HTML you'd have to run that string through an HTML parser yourself. The XML DOM wouldn't parse the text attribute for you.

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
11

CDATA is unfortunately an ambiguous thing to say here. There are "CDATA Sections", and "CDATA Attribute Type".

Your attribute value can be of type CDATA with the "CDATA Attribute Type".

Here is an xml that contains a "CDATA Section" (aka. CDSect):

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<elemke>
<![CDATA[
foo
]]>
</elemke>

Here is an xml that contains a "CDATA Attribute Type" (as AttType):

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE elemke [
<!ATTLIST brush wood CDATA #REQUIRED>
]>

<elemke>
<brush wood="guy&#xA;threep"/>
</elemke>

You cannot use a "CDATA Section" for an Attribute Value: wrong:<brush wood=<![CDATA[foo]]>/>

You can use a "CDATA Attribute Type" for your Attribute's Type, I think this is actually what happens in the usual case, and your attribute value is actually a CDATA: for an element like <brush wood="guy&#xA;threep"/>, in the raw binary bytestring that is the .xml file, you have guy&#xA;threep however when the file is processed, the attribute value in memory will be

guy
threep

Your problem may lie in 1) producing a right xml file and 2) configuring a "xml processor" to produce an output you want.

For example, in case you write a raw binary file as your xml by hand, you need to put these escapes inside the attribute value part in the raw file, like I wrote <brush wood="guy&#xA;threep"/> here, instead of <brush wood="guy (newline) threep"/>

Then the parse would actually give you a newline, I've tried this with a processor.

You can try it with a processor like saxon or for poor-man's experiment one like a browser, opening the xml in firefox and copying the value to a text editor - firefox displayed the newline as a space, but copying the string to a text editor showed the newline. (Probably with a better suited processor you could save the direct output right away.)

Now the "only" thing you need to do is make sure you handle this CDATA appropriately. For example, if you have an XSL stylesheet, that would produce you a html, you can use something like this .xsl for such an xml:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet  version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template name="split">
  <xsl:param name="list"      select="''" />
  <xsl:param name="separator" select="'&#xA;'" />
  <xsl:if test="not($list = '' or $separator = '')">
    <xsl:variable name="head" select="substring-before(concat($list, $separator), $separator)" />
    <xsl:variable name="tail" select="substring-after($list, $separator)" />

    <xsl:value-of select="$head"/>
    <br/><xsl:text>&#xA;</xsl:text>
    <xsl:call-template name="split">
        <xsl:with-param name="list"      select="$tail" />
        <xsl:with-param name="separator" select="$separator" />
    </xsl:call-template>
  </xsl:if>
</xsl:template>


<xsl:template match="brush">
  <html>
  <xsl:call-template name="split">
    <xsl:with-param name="list" select="@wood"/>
  </xsl:call-template>
  </html>
</xsl:template>

</xsl:stylesheet>

Which in a browser or with a processor like saxon using java -jar saxon9he.jar -s:eg2.xml -xsl:eg2.xsl -o:eg2.html saxon home edition 9.5 would produce this html-like thing:

<html>guy<br>
   threep<br>

</html>  

which will look like this in a browser:

guy
threep

Here I am using a recursive template 'split' from Tomalak, thanks to Mads Hansen, because my target processor doesn't support neither string-join nor tokenize which are version 2.0 only.

Community
  • 1
  • 1
n611x007
  • 8,952
  • 8
  • 59
  • 102
  • by the way, here is a common-sense writing about distinction between xml "parsing" and "processing", http://www.oxygenxml.com/archives/xsl-list/200009/msg00750.html It looks good but I haven't checked its level of correctness. – n611x007 Apr 21 '15 at 20:03
10

If an attribute is not a tokenized or enumerated type, it is processed as CDATA. The details for how the attribute is processed can be found in the Extensible Markup Language (XML) 1.0 (Fifth Edition).

3.3.1 Attribute Types

XML attribute types are of three kinds: a string type, a set of tokenized types, and enumerated types. The string type may take any literal string as a value; the tokenized types are more constrained. The validity constraints noted in the grammar are applied after the attribute value has been normalized as described in 3.3.3 Attribute-Value Normalization.

[54]  AttType       ::=    StringType | TokenizedType | EnumeratedType
[55]  StringType    ::=    'CDATA'
[56]  TokenizedType ::=    'ID' [VC: ID]
            [VC: One ID per Element Type]
            [VC: ID Attribute Default]
        | 'IDREF'      [VC: IDREF]
        | 'IDREFS'     [VC: IDREF]
        | 'ENTITY'     [VC: Entity Name]
        | 'ENTITIES'   [VC: Entity Name]
        | 'NMTOKEN'    [VC: Name Token]
        | 'NMTOKENS'   [VC: Name Token]

...

3.3.3 Attribute-Value Normalization

Before the value of an attribute is passed to the application or checked for validity, the XML processor MUST normalize the attribute value by applying the algorithm below, or by using some other method such that the value passed to the application is the same as that produced by the algorithm.

  1. All line breaks MUST have been normalized on input to #xA as described in 2.11 End-of-Line Handling, so the rest of this algorithm operates on text normalized in this way.
  2. Begin with a normalized value consisting of the empty string.
  3. For each character, entity reference, or character reference in the unnormalized attribute value, beginning with the first and continuing to the last, do the following:
    • For a character reference, append the referenced character to the normalized value.
    • For an entity reference, recursively apply step 3 of this algorithm to the replacement text of the entity.
    • For a white space character (#x20, #xD, #xA, #x9), append a space character (#x20) to the normalized value.
    • For another character, append the character to the normalized value.

If the attribute type is not CDATA, then the XML processor MUST further process the normalized attribute value by discarding any leading and trailing space (#x20) characters, and by replacing sequences of space (#x20) characters by a single space (#x20) character.

Note that if the unnormalized attribute value contains a character reference to a white space character other than space (#x20), the normalized value contains the referenced character itself (#xD, #xA or #x9). This contrasts with the case where the unnormalized value contains a white space character (not a reference), which is replaced with a space character (#x20) in the normalized value and also contrasts with the case where the unnormalized value contains an entity reference whose replacement text contains a white space character; being recursively processed, the white space character is replaced with a space character (#x20) in the normalized value.

All attributes for which no declaration has been read SHOULD be treated by a non-validating processor as if declared CDATA.

It is an error if an attribute value contains a reference to an entity for which no declaration has been read.

Rich Seller
  • 83,208
  • 23
  • 172
  • 177
  • 1
    I may have interpretation difficulties. Will the `myattr="a b"` Attribute result in the normalized value `a b`? – n611x007 Apr 21 '15 at 16:18
2

We can't use CDATA as attribute, but we can bind html using HTML codes. Here is one example:

to achieve this: <span class="abc"></span>

use XML code like this:

<xmlNode attibuteName="&lt;span class=&quot;abc&quot;&gt;Your Text&lt;&#47;span&gt;"></xmlNode>

sumit raju
  • 21
  • 1
1

Yes you can when you encode the content within the XML tags. I.e. use &amp; &lt; &gt; &quot; &apos;, that way it will not be seen as markup inside your markup.

John Saunders
  • 160,644
  • 26
  • 247
  • 397