Acceptance of HTML5 Polyglot served as application/xhtml+xml

Question

In terms of browser support and HTML5 compliance, (assuming page is actually XML well-formed) how convenient is to serve HTML5 polyglot page with application/xhtml+xml HTTP Content-type header?

In earlier times I served XHTML width text/html header instead, because otherwise some browsers not rendered page at all or rendered but have some oddities in behavior.

Does HTML5 standard even requires browsers to support application/xhtml+xml content-type? What is the actual state of support across browsers? What are todays drawbacks of serving with application/xhtml+xml?

always `text/html`, but most important is your encoding type — , Oct 27 '14 at 13:20

Alohci · Accepted Answer · 2014-10-27T14:45:22.997

1

No, HTML5 does not require browsers to support either application/xhtml+xml or text/html.

It merely says

For compatibility with existing content and prior specifications, this specification describes two authoring formats: one based on XML (referred to as the XHTML syntax), and one using a custom format inspired by SGML (referred to as the HTML syntax). Implementations must support at least one of these two formats, although supporting both is encouraged.

Since IE9, application/xhtml+xml has been supported on all browsers of note.

Assuming you can author well-formed XML, which really isn't all that hard, the biggest gotcha is that not all Unicode characters are valid XML characters, so you must always have additional sanitization of user input that might be echoed to the screen to clean up any characters that are not valid in XML, or your web page will fail to render correctly (or at all).

Also, third party JS libraries are not always polyglot compatible. In particular, some rely on document.write() which isn't supported for XML documents.

edited Oct 27 '14 at 14:45

answered Oct 27 '14 at 13:50

Alohci

78,296
16
112
156

"not all Unicode characters are valid XML characters, so you must always have additional sanitization of user input" All unicode characters are valid. I believe what you're getting at is that XML only defines 5 character entity references: quot, apos, lt, gt, amp. Regardless, user input should always be sanitized (removing unwanted data) if necessary and escaped (making data safe for the context). If you get XML parsing errors on a page that was previously working, it generally means you forgot to escape and are leaving yourself open to a vulnerability. That's why I test using XML parsing. – Chinoto Vokro Mar 09 '16 at 21:14
@chinoto - No I wasn't referring to named character entity references. Valid XML characters omit the surrogate blocks, U+FFFE, and U+FFFF. – Alohci Mar 09 '16 at 22:05
Oh yes, forgot about those guys. But to be clear, it seems that they are only disallowed when unpaired or paired improperly. – Chinoto Vokro Mar 14 '16 at 21:20

Acceptance of HTML5 Polyglot served as application/xhtml+xml

1 Answers1