5

Are there any characters that are encoded in HTML but not XML, or vice versa?

Are all the encodings the same between them? Like > for greater than symbol?

Brett Allen
  • 5,297
  • 5
  • 32
  • 62

2 Answers2

7

XML does predefine a handful of character entities. See section 4.6 of the XML 1.1 spec:

http://www.w3.org/TR/xml11/#sec-predefined-ent

In particular, XML defines <, >, &, ', and " ("All XML processors MUST recognize these entities whether they are declared or not").
Any other entities must be referenced via numeric reference, as Brian states, or by an appropriate definition in an <!ENTITY ...> construct in the document itself or a referenced DTD.

All of these entities are defined in HTML as well.

Ondrej
  • 1,209
  • 1
  • 11
  • 21
BobG
  • 2,113
  • 17
  • 15
  • Though as noted in http://stackoverflow.com/questions/2083754/why-shouldnt-apos-be-used-to-escape-single-quotes, the apos escape sequence is NOT part of the HTML spec, and not supported by certain browsers – evnafets May 20 '13 at 22:38
  • @evnafets: "certain browsers" = IE8, the usual suspect – Stefan Steiger Nov 30 '14 at 21:51
2

Yes. HTML4 defines a number of named entities which aren't present by default in XML. You can see the list on the w3.org website. &gt; is one such encoded entity. Likewise, &lt; is the named entity for <, but you can also write it like so: &#60;. As far as I know you can use the numbered version freely in both HTML and XML. See the w3.org link for how to define your own entities in XML documents.

Brian Donovan
  • 8,274
  • 1
  • 26
  • 25