Given that XML validation happens against a DTD or a schema, but HTML5 allows user-defined elements and data-*
attributes (and itself is a living standard anyway depending on whom you ask), your suspicions are most likely correct — these two things are incompatible. Granted one could go and write a DTD/schema that caters to their document by accounting for all custom elements and attributes, and it would certainly validate in the strictest sense of the term, but that's not quite how it works.
The good news is that, in polyglot markup, this won't be an issue. In section 3.1 of the polyglot markup specification, it says:
Polyglot markup results in:
- a valid HTML document. [HTML5]
- a well-formed XML document. [XML10]
- identical DOMs when processed as HTML and when processed as XML, with some notable exceptions: HTML and XML parsers generate different DOMs for some
xml
(xml:lang
, xml:space
, and xml:base
), xmlns
(xmlns=""
and xmlns:xlink=""
), and xlink
(such as xlink:href
) attributes. XML requires and HTML5 permits these attributes in certain locations and the attributes are preserved by HTML parsers. The exception must not break the requirement to be a valid HTML document.
Polyglot Markup specifies a Robust Syntax, by which it is meant a syntax that maximizes support and minimizes authoring choice.
However:
Polyglot markup is not constrained:
- to be valid XML. [XML10]
- by conformance to any XML DTD.
This means that polyglot markup conforms to HTML5 by circumstance, but does not need to conform to any XML DTD in order to work. It is simply a serialization of HTML, and not an XML document type in and of itself. The concept of XML validation is in fact completely irrelevant to polyglot markup, just as XML validation is irrelevant to any XML document that doesn't declare conformance to any particular schema.