8

I wonder about the number of web pages I encounter that are HTML files, but that wear an XHTML DOCTYPE declaration.
Why are people doing this? What do they hope to achieve? Why not reserve the XHTML doctype declaration for actual XHTML files?

Or am I missing something?

Edit: there is some confusion about what "actual XHTML files" are; to demonstrate that the difference is not caused by the DOCTYPE declaration, compare this file to this one. The first is HTML, the second is XHTML, although the contents are identical; only the file types differ. Both display fine in compliant browsers, but the first one is parsed with the HTML parser and the second one with the XML parser.

Mr Lister
  • 45,515
  • 15
  • 108
  • 150
  • What do you mean by "actual XHTML files"? Ones with valid XHTML content? – James Allardice Jan 15 '12 at 17:53
  • No, files with a XHTML filetype; .xhtml suffix; application./xhtml+xml MIME type... – Mr Lister Jan 15 '12 at 18:05
  • Although they have now converted to HTML5, for many years W3Schools did it like that, and advocated the approach. While they were not the only cause, and I couldn't comment on individual decisions, either directly or indirectly they must have influenced a lot of web sites. – Alohci Jan 15 '12 at 18:54
  • @Alohci: You mean with regards to just slapping on the XHTML doctype and forgetting everything else? – BoltClock Jan 15 '12 at 19:01
  • 3
    @BoltClock - not exactly. Just that they marked up their pages as XHTML and served them as `text/html`. Neither their description of the conversion, nor anything else in the XHTML pages of their site, ever discussed the importance or relevance of mime types at all. – Alohci Jan 15 '12 at 19:43

4 Answers4

11

Why put an XHTML doctype declaration on HTML files? What does that do?

All that does is tell markup validators that they're about to validate an XHTML document, as opposed to a regular, SGML-rooted, HTML document. It describes the content, or more specifically the markup that follows, but nothing else.

Why are people doing this? What do they hope to achieve? Why not reserve the XHTML doctype declaration for actual XHTML files?

Or am I missing something?

Kind of. What actually happened was that people weren't aware that just putting an XHTML doctype declaration on top of an HTML document didn't automatically transform it into an XHTML document, although admittedly that was what everybody was hoping for.

You see, most web applications out there aren't configured to serialize XHTML documents as application/xhtml+xml properly, instead opting to serve pages as just text/html. (It's typically because of the .html file extension more than anything else, really; generally speaking, servers do correctly apply application/xhtml+xml to documents with .xhtml or .xht as the extension, but only static sites that actually make use of the file format will benefit from this.) That leads browsers to decide that they received a regular HTML document, and so that tag soup parsing nonsense we've all come to know and love inevitably ensues.

Note that it doesn't matter even if you have a meta tag like this on your XHTML document:

<meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8" />

Browsers will ignore that, and only look at the actual HTTP Content-Type header that was sent along with the XHTML document.

To make matters worse, Internet Explorer, being the most-used browser in the past few years in XHTML's heyday, never properly supported the application/xhtml+xml MIME type before version 9 was finally released: instead of parsing the markup, constructing the DOM and rendering the page, all it would do was ask for a file download. That doesn't make a very usable XHTML page!

So, guess what we all had to live with until HTML5 became cool?

This, along with things like IE6 going quirky on pages with the XML declaration before the doctype declaration, is also one of the biggest factors leading to XHTML's downfall (along with XHTML 1.1 never gaining widespread usage, and XHTML 2.0 being canceled in favor of HTML5).

BoltClock
  • 700,868
  • 160
  • 1,392
  • 1,356
  • Actually, I like XHTML. If I want to make sure my pages are correct, I don't have to upload them and then unleash the validator on them, all I have to do is try to load them into my browser. Sure, I can serve up HTML5 documents with the XHTML doctype, but that is not the same. For one thing, I wouldn't know if the browsers would use the XML or the HTML parser. – Mr Lister Jan 15 '12 at 19:33
  • @Mr Lister: I like it too. You can just serve the correct MIME type, though, and modern browsers (including IE9+) will treat your documents as XHTML. (On a side note, "HTML5 documents with the XHTML doctype" doesn't really make sense :) – BoltClock Jan 15 '12 at 19:39
  • Darn, too late to edit. I meant HTML5 with the XHTML content-type! – Mr Lister Jan 15 '12 at 20:02
5

Most people use the XHTML doctype because they read it in an old book somewhere or read it on a forum but otherwise are using it for no technical reason they are aware of. Hardly anyone uses it properly by serving it as application/xml+xhtml. Serving XHTML pages as text/html means "tag soup" or "broken html". It should not be done but browsers generally handle it well.

You are correct in your wondering about this. It drives me crazy.

Rob
  • 14,746
  • 28
  • 47
  • 65
3

I assume that you're asking why people are serving XHTML documents as HTML, by using the text/html MIME type instead of application/xhtml+xml.

Mostly, it's because of a misguided understanding of compatibility: Lots of browsers simply don't understand the XHTML+XML MIME type, which has caused users to simply serve it as HTML to overcome this. Since browsers often don't complain about what they get, and people don't tend to research a lot, most people assume that the browsers just treat the XHTML-doctyped document as XHTML, even though it was served as HTML. But they don't - thry serve them as HTML. Since the two languages are so much alike, people rarely notice the difference.

So no, you're not missing anything; it's very bad practice. Nowadays, after HTML5, luckily, it seems to become less common.

kba
  • 19,333
  • 5
  • 62
  • 89
  • 2
    Most people writing XHTML are serving it as `text/html`, at which point you lose most of the advantages. All browsers, for instance, treat it as HTML in that case. Have a read of [Sending XHTML as text/html Considered Harmful](http://hixie.ch/advocacy/xhtml) – robertc Jan 15 '12 at 18:05
  • "why people are serving XHTML documents as HTML, by using the text/html MIME type instead of application/xhtml+xml." is another way of saying it, yes. So, OK. I'm not sure where you got the downvote from; probably someone who doesn't fully understand. – Mr Lister Jan 15 '12 at 19:08
  • @Mr Lister: The answer was edited into something completely different after the downvote. Either the downvoter never followed up, or... they don't understand :) – BoltClock Jan 15 '12 at 19:51
2

The hilarious thing about XHTML is that because IE didn't understand the XML mimetype (application/xhtml+xml) at the peak of XHTML's popularity, most people never actually used the XML part of it as IE8 and lower refuse to render the content.

This meant that millions of sites think they are using standards compliant XHTML, when in fact they are being parsed as malformed/weird HTML4.

Luckily HTML5 came along and properly defined the parsing of documents, removing much of the ambiguity that surrounded XHTML (all that transitional and strict rubbish).

People who add the XML prolog before the doctype are doing themselves an extra disservice, as a comment before the doctype will cause old IE to use quirks mode, which among other things brings back the old box-model in IE6 and below. This undoubtedly has contributed to the mass hate of IE6, as in quirks mode it has significant bugs that cause modern layouts to be completely broken, rather than just lacking in newer features.

The short answer is that in this industry many people just copy and paste code without understanding it.

Rich Bradshaw
  • 71,795
  • 44
  • 182
  • 241
  • 4
    IE understood XML just fine. It was the XHTML part it did not support. – Rob Jan 15 '12 at 18:09
  • 4
    And it's not a "hilarious thing about XHTML". It's damn sad and a perfect example of how IE always holds back the web. – Rob Jan 15 '12 at 18:12
  • @Rob, true. I'm also no sure why they didn't address this for so long. I almost suspect that they didn't really want the web to go towards XML for whatever reason. – Rich Bradshaw Jan 15 '12 at 18:25
  • 1
    @BoltClock - What do you mean, they only did it for HTML5? From a browser's point of view HTML5 is backward compatible with earlier versions. There's no suggestion that earlier XHTML versions are not supported too. Indeed, my web site serves XHTML 1.0 as application/xhtml+xml to IE9 and it's quite happy with it. – Alohci Jan 15 '12 at 19:04
  • @RichBradshaw "Luckily HTML5 came along and properly defined the parsing of documents" that's not really fair. XHTML also was a very nice and unambigous standard, at least if you don't count XHTML 1.0. – Mr Lister Jan 15 '12 at 19:11
  • @Alohci: Never mind... you're right, what I said didn't make sense. – BoltClock Jan 15 '12 at 19:12
  • 3
    @RichBradshaw "Luckily HTML5 came along and properly defined the parsing of documents" HTML 4 and earlier parsing was also properly defined. No browser vendor cared to implement the parsing properly initially, and later on the web had become such a mess that fixing the parsing of HTML would mean that hardly any sites would show up as expected. –  Jan 15 '12 at 19:37