What characters must be escaped in XML documents, or where could I find such a list?
-
10Example: `
AT&T ` – jacktrades Dec 05 '12 at 19:47 -
1See [**Simplified XML Escaping**](https://stackoverflow.com/a/46637835/290085) below for a concise and easily remembered guide that I've distilled from primary sources ([*W3C Extensible Markup Language (XML) 1.0 (Fifth Edition)*](https://www.w3.org/TR/xml/#syntax)). – kjhughes Feb 14 '18 at 16:40
-
1Literally none of the answers here are correct. You also must escape many various control characters in XML 1.1. – Jason C May 04 '21 at 18:27
-
1@JasonC: Understanding the question as intended rather than literally is ideal. If you feel future readers would benefit from an elaboration of how to specify control characters in XML, please elaborate in an answer. Thanks. – kjhughes Dec 03 '21 at 16:52
-
@kjhughes With the question being interpreted as intended, literally none of the answers here are correct. You also must escape many various control characters in XML 1.1, as outlined [here](https://www.w3.org/International/questions/qa-controls). See also XML 1.1 [§4.1](https://www.w3.org/TR/2006/REC-xml11-20060816/#sec-references), [§4.4](https://www.w3.org/TR/2006/REC-xml11-20060816/#entproc), [§4.6](https://www.w3.org/TR/2006/REC-xml11-20060816/#sec-predefined-ent), and [Appx. C](https://www.w3.org/TR/2006/REC-xml11-20060816/#sec-entexpand) for specific details and restrictions. – Jason C Dec 03 '21 at 20:26
-
@JasonC: I've updated [Simplified XML Escaping](https://stackoverflow.com/a/46637835/290085) below to address your point. Let me know if you have further recommendations. Thanks. – kjhughes Dec 04 '21 at 01:18
10 Answers
If you use an appropriate class or library, they will do the escaping for you. Many XML issues are caused by string concatenation.
XML escape characters
There are only five:
" "
' '
< <
> >
& &
Escaping characters depends on where the special character is used.
The examples can be validated at the W3C Markup Validation Service.
Text
The safe way is to escape all five characters in text. However, the three characters "
, '
and >
needn't be escaped in text:
<?xml version="1.0"?>
<valid>"'></valid>
Attributes
The safe way is to escape all five characters in attributes. However, the >
character needn't be escaped in attributes:
<?xml version="1.0"?>
<valid attribute=">"/>
The '
character needn't be escaped in attributes if the quotes are "
:
<?xml version="1.0"?>
<valid attribute="'"/>
Likewise, the "
needn't be escaped in attributes if the quotes are '
:
<?xml version="1.0"?>
<valid attribute='"'/>
Comments
All five special characters must not be escaped in comments:
<?xml version="1.0"?>
<valid>
<!-- "'<>& -->
</valid>
CDATA
All five special characters must not be escaped in CDATA sections:
<?xml version="1.0"?>
<valid>
<![CDATA["'<>&]]>
</valid>
Processing instructions
All five special characters must not be escaped in XML processing instructions:
<?xml version="1.0"?>
<?process <"'&> ?>
<valid/>
XML vs. HTML
HTML has its own set of escape codes which cover a lot more characters.

- 30,738
- 21
- 105
- 131

- 59,154
- 9
- 110
- 123
-
But as for HTML, we would only have to escape the five above too right? – Pacerier Jan 12 '12 at 21:51
-
42@Pacerier, I beg you not to write your own XML/HTML escaping code. Use a library function or you're bound to miss a special case. – Jason Mar 16 '12 at 09:23
-
8Also for line breaks you need to use and for tab, if you need these characters in an attribute. – radistao Nov 26 '12 at 22:33
-
Carriage Return ` ` is only included for backward-compatibility as noted in the section that precedes the one linked to by MicSim. Avoid using it as it is ether removed or replaced by ` `. – seininn May 17 '13 at 18:44
-
91If you're going to do a Find/Replace on these, just remember to do the & replacement before the others. – Doug Jun 15 '13 at 21:29
-
2@Doug I was just about to mention the exact same thing - or else all other replaced characters will be corrupted, and things like `"` will be changed to `"` – Jerry Dodge Aug 05 '13 at 22:23
-
1Notice, that in HTML you actually just have to escape `<` and `&`. While the other three are also defined, there is actually no need to escape them within valid XML – dirkk Apr 29 '14 at 14:34
-
11From Wikipedia: "All permitted Unicode characters may be represented with a numeric character reference." So there are a lot more than 5. – Tim Cooper Aug 15 '14 at 07:47
-
@dirkk I found the same to be true in my testing. I escaped all 5 originally to be safe, even though the ampersand alone was the original target for the bug. Upon further testing I was finding that the apostrophe for example was making it through to the application with no problems at all. – Brien Foss Mar 09 '16 at 18:08
-
1You *can* escape any characters you want -- even *every* character. Only less-than, ampersand, and the sequence "]]>" actually matter if you're trying to turn an arbitrary string into XML content (that is, you don't want any tags or other XML constructs to be detected within it). "]]>" is uncommon, so some people ignore it; or you can change the ">" in it to > or > or > – TextGeek Jul 24 '19 at 14:40
-
-
@Jason I looked up source code of "xml-escape" lib for js. It has 22 lines of code and covers exactly 5 chars. Seems trivial enough. But that's just the actual XML. HTML is different animal altogether. – Gherman Jul 19 '22 at 00:35
Perhaps this will help:
List of XML and HTML character entity references:
In SGML, HTML and XML documents, the logical constructs known as character data and attribute values consist of sequences of characters, in which each character can manifest directly (representing itself), or can be represented by a series of characters called a character reference, of which there are two types: a numeric character reference and a character entity reference. This article lists the character entity references that are valid in HTML and XML documents.
That article lists the following five predefined XML entities:
quot "
amp &
apos '
lt <
gt >

- 218,210
- 55
- 464
- 476

- 344,730
- 71
- 640
- 635
New, simplified answer to an old, commonly asked question...
Simplified XML Escaping (prioritized, 100% complete)
Always (90% important to remember)
Attribute Values (9% important to remember)
attr="
'
Single quotes'
are ok within double quotes."
attr='
"
Double quotes"
are ok within single quotes.'
- Escape
"
as"
and'
as'
otherwise.
Comments, CDATA, and Processing Instructions (0.9% important to remember)
Esoterica (0.1% important to remember)
- Escape control codes in XML 1.1 via Base64 or Numeric Character References.
- Escape
]]>
as]]>
unless]]>
is ending a CDATA section.
(This rule applies to character data in general – even outside a CDATA section.)

- 106,133
- 27
- 181
- 240
-
2One other rule worth noting: `]]>` must be escaped as `]]>`, even when not in a CDATA section. The easiest way of achieving that may be to *always* escape `>` as `>`. – Michael Kay May 29 '18 at 15:24
-
Thanks, @MichaelKay. I've incorporated your helpful note about `]]>` but chose to relegate it to esoterica rather than suggesting that `>` *always* be escaped (which it needn't be, as you know). My goal here to make the XML escaping rules ***easily remembered*** *and* ***100% accurate***. – kjhughes Jun 03 '18 at 14:01
-
The above answers including accepted one mention all five characters should be escaped inside attributes. Do you have any reference to XML standard to back what you are saying as your answer logically seems to be the correct one? – Roman Susi Feb 07 '20 at 05:49
-
3@RomanSusi: Yes, many other answers contain errors or overgeneralizations ("The safe way...") based on hearsay, misinterpretation, or misunderstanding of the official XML BNF. My answer is (a) 100% justified by W3C XML Recommendation; see the many linked references to the official BNF, and (b) organized in a concise, logical, and easily remembered progression of those requirements. – kjhughes Feb 07 '20 at 13:44
-
@RomanSusi: The specific statement that "all five characters should be escaped inside attributes" is sloppy guidance unsupported by the official BNF rule for `AttValue` cited in my answer via a link on **2.** [**Attribute Values**](https://www.w3.org/TR/xml/#NT-AttValue). – kjhughes Feb 07 '20 at 13:44
-
Ah ok... I was actually looking whether & needs to be escaped, so missed Always part, thanks! – Roman Susi Feb 07 '20 at 14:21
-
1I think I should change my future first child name from Felipe to ";'Felipe]]>
-
-
@FelipeValdes: Conformant XML parsers will reject documents as not well-formed when they contain `]]>` anywhere other than ending a CDATA section or the null char anywhere in a document. What browsers will do over time and the impact on your childrens' development are less clear. – kjhughes Nov 18 '20 at 14:41
According to the specifications of the World Wide Web Consortium (w3C), there are 5 characters that must not appear in their literal form in an XML document, except when used as markup delimiters or within a comment, a processing instruction, or a CDATA section. In all the other cases, these characters must be replaced either using the corresponding entity or the numeric reference according to the following table:
Original CharacterXML entity replacementXML numeric replacement
< < <
> > >
" " "
& & &
' ' '
Notice that the aforementioned entities can be used also in HTML, with the exception of ', that was introduced with XHTML 1.0 and is not declared in HTML 4. For this reason, and to ensure retro-compatibility, the XHTML specification recommends the use of ' instead.

- 1,982
- 2
- 21
- 33
-
18XML predefines those five entities, but it absolutely does NOT specify that you can't use any of those five characters in their literal form. < and & have to be escaped everywhere (except CDATA). " and ' only have to be escaped in attribute values, and only if the corresponding quote character is the same. And > never actually has to be escaped. – Shaun McCance Aug 24 '13 at 13:58
-
3As written above, < > " & ' do not have to be escaped when used as markup delimiters or within a comment, a processing instruction, or a CDATA section. i.e. when you use < > as an XML tag you don't escape it. Same thing for a comment (would you escape an & in a commented line of a XML file? You don't need to, and your XML is still valid if you don't). This is clearly specified in the [official recommendations for XML by W3C](http://www.w3.org/TR/xml11/#dt-chardata). – Albz Oct 01 '13 at 07:21
-
7@ShaunMcCance `>` must be escaped if it follows `]]` within content, unless it's intended to be part of the `]]>` delimiter that indicates the end of a CDATA section. – Lee D Apr 25 '14 at 17:45
-
3Not to be a necromancer, but @Albz is incorrect in saying that these characters MUST be entitized in content. See section 2.4 at https://www.w3.org/TR/REC-xml/#NT-CharData. The TL;DR version of that is that in chardata element content, & and < have to always be entitized. The > character MAY be entitized, although it MUST be when appearing in the literal string “]]>” because otherwise that will be read as ending a CDATA section. For single-quote and double-quote, you can escape if you want to. That's it, for chardata inside elements. Other components of XML have other rules. – chris May 03 '16 at 17:52
Escaping characters is different for tags and attributes.
For tags:
< <
> > (only for compatibility, read below)
& &
For attributes:
" "
' '
From Character Data and Markup:
The ampersand character (&) and the left angle bracket (<) must not appear in their literal form, except when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section. If they are needed elsewhere, they must be escaped using either numeric character references or the strings " & " and " < " respectively. The right angle bracket (>) may be represented using the string " > ", and must, for compatibility, be escaped using either " > " or a character reference when it appears in the string " ]]> " in content, when that string is not marking the end of a CDATA section.
To allow attribute values to contain both single and double quotes, the apostrophe or single-quote character (') may be represented as " ' ", and the double-quote character (") as " " ".

- 30,738
- 21
- 105
- 131

- 1,474
- 1
- 12
- 20
-
1This implies that for attributes only quotes need to be escaped, but that is in addition to the other three characters – eug Jul 05 '18 at 04:46
In addition to the commonly known five characters [<, >, &, ", and '], I would also escape the vertical tab character (0x0B). It is valid UTF-8, but not valid XML 1.0, and even many libraries (including the highly portable (ANSI C) library libxml2) miss it and silently output invalid XML.

- 30,738
- 21
- 105
- 131

- 698
- 10
- 22
Abridged from: XML, Escaping
There are five predefined entities:
< represents "<"
> represents ">"
& represents "&"
' represents '
" represents "
"All permitted Unicode characters may be represented with a numeric character reference." For example:
中
Most of the control characters and other Unicode ranges are specifically excluded, meaning (I think) they can't occur either escaped or direct:

- 30,738
- 21
- 105
- 131

- 10,023
- 5
- 61
- 77
The accepted answer is not correct. Best is to use a library for escaping xml.
As mentioned in this other question
"Basically, the control characters and characters out of the Unicode ranges are not allowed. This means also that calling for example the character entity is forbidden."
If you only escape the five characters. You can have problems like An invalid XML character (Unicode: 0xc) was found

- 2,969
- 30
- 27
-
-
Each language will be different. You can check Java in this other Stackoverflow question https://stackoverflow.com/a/439311 – Gabriel Furstenheim May 04 '23 at 07:38
It depends on the context. For the content, it is < and &, and ]]> (though a string of three instead of one character).
For attribute values, it is <, &, ", and '.
For CDATA, it is ]]>.

- 30,738
- 21
- 105
- 131

- 269
- 2
- 8
Only <
and &
are required to be escaped if they are to be treated character data and not markup:

- 30,738
- 21
- 105
- 131

- 39
- 4