4

I need to represent content in a lingua franca, that is, in nowadays, the HTML5 standard — my objective is not to show a page in the web-browser. I need to represent only content, no interface, no layout, no logic (no Javascript).

As remembered in other questions (or programmers questions), and the W3C HTML5's Recommendation, "HTML vs XHTML" section,

the DOM, the HTML syntax, and the XHTML syntax cannot all represent the same content.

Ok, but ~90% can be the same (!), and, if I not need Javascript, Styles, etc. and I can enforce some constraints, it will be 100%... So, the question is about what constraints I need to use (?) to ensure that all HTML5 serialized as XHTML5 will be represent the same thing, and vice-versa (an XSLT that will back with the original HTML5 document).

There are a  "subset of HTML5 elemements"  or a  "subset with some aditional constraints"  that ensures the reversibility of XHTML5/HTML5 convertions?

Community
  • 1
  • 1
Peter Krauss
  • 13,174
  • 24
  • 167
  • 304

1 Answers1

1

Polyglot Markup: A robust profile of the HTML5 vocabulary, which is currently a W3C Candidate Recommendation, defines rules for a document

[…] that is a stream of bytes that parses into identical document trees (with some exceptions, as noted in the Introduction) when processed either as HTML or when processed as XML.

You can find the rules for writing such a document in section 4: Writing HTML documents.

unor
  • 92,415
  • 26
  • 211
  • 360
  • hum... there are a W3C text, ok... But W3C is contradictory (!): or "HTML syntax, and the XHTML syntax *CANNOT* all represent the same content" or "*CAN*" (?). PS: practice shows (any one see that) that CANNOT. – Peter Krauss Feb 11 '15 at 14:59
  • @PeterKrauss: I think the HTML5 Rec means it like "… unless you voluntarily follow some constraints". But note that even polygot markup has ["notable exceptions"](http://www.w3.org/TR/2014/CR-html-polyglot-20140717/#principles), e.g., different DOMs are created for some `xml`/ `xmlns`/`xlink` attributes. – unor Feb 11 '15 at 15:04
  • 1
    "I will voluntarily follow some constraints"... and, sorry! Only now I am reading more the links, understanding that "poliglot markap" is a radical solution for my problem... It is a third representation (!). If I use Poliglot representation, all conversions (Poliglot-to-XHTML5, XHTML-to-Poliglot, Poliglot-to-HTML5, HTML5-to-Poliglot) will be reversible (!). So, if I represent my contents with Poliglot, the converted XHTML5 is reversible for HTML5. Well it is my conclusion, do you confirm? – Peter Krauss Feb 12 '15 at 01:44
  • 1
    @PeterKrauss: I’m not sure I exactly understand your case, so I don’t know if this explanation helps: The *HTML5* specification defines two syntaxes, the HTML syntax and the XHTML syntax. If you follow the rules from the [polyglot markup profile](http://www.w3.org/TR/html-polyglot/), you get *one* HTML5 document that conforms to *both* syntaxes (so this document is a valid HTML document *and* a well-formed XML document), and on top that, no matter if your document gets processed as XML or as HTML, it results in (almost) identical DOMs. – unor Feb 12 '15 at 01:54
  • Thanks! (the old title of the Recommendation summarizes "[HTML-Compatible XHTML Documents](http://www.w3.org/TR/2011/WD-html-polyglot-20110405/)"). – Peter Krauss Feb 12 '15 at 02:02