My customer wants to write my xml file as <name>Smith & Jones</name>
, not <name>Smith & Jones</name>
.
I can't find a quality reference discussing this.
My customer wants to write my xml file as <name>Smith & Jones</name>
, not <name>Smith & Jones</name>
.
I can't find a quality reference discussing this.
From the XML specification (§2.4):
The ampersand character (&) and the left angle bracket (<) may appear in their literal form only when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section. They are also legal within the literal entity value of an internal entity declaration; see "4.3.2 Well-Formed Parsed Entities". If they are needed elsewhere, they must be escaped using either numeric character references or the strings "&" and "<" respectively.
Since this circumstance fits into none of the stated categories, it is illegal.
Use the CDDATA tag to insert these characters within the XML tags without XML parsing them:
<name>Smith & Jones</name>
becomes
<name><![CDATA[ Smith & Jones ]]></name>
this way you can also put plain html withing xml.
You can't, at least if you want to keep calling your file "XML". XML does not allow unescaped ampersands, and any conforming parser will reject a file with them as "improperly formed".
You can use CDATA, but that introduces its own ugliness, and most serializers don't generate CDATA by default.
The XML specification is clear that this is not well-formed XML.
If you want to know WHY the spec was written that way, that's always a much harder question to answer. Sometimes (but not this time) Tim Bray's annotated version of the XML recommendation at http://www.xml.com/axml/testaxml.htm sheds some light. Sometimes (but not this time) the comments and other notes in the XML source of the specification at http://www.w3.org/TR/1998/REC-xml-19980210.xml are revealing. In the absence of such clues, it is useful to recall that the creators of XML were very anxious to preserve compatibility with SGML, and that they were generally disposed towards having parsers that could detect errors in the XML rather than making XML easy to author.