1

Note: this is supposed to be the canonical post for this question. A number of answers exist already, but descriptions of the various differences are scattered all over the place, and more often than not, they also offer opinions to "which one should I use", which I will refrain from in here.
If you have more questions to ask, or you know of more differences, feel free to edit.

What is the difference between XHTML and HTML? Isn't XHTML merely a more strict version of HTML? And why are there different versions of XHTML if they all act the same?

JJJ
  • 32,902
  • 20
  • 89
  • 102
Mr Lister
  • 45,515
  • 15
  • 108
  • 150
  • Is this different from https://stackoverflow.com/questions/2662508/html-4-html-5-xhtml-mime-types-the-definitive-resource ? – Alohci May 22 '19 at 22:59
  • @Alohci The previous one has lots of details about the difference between HTML4 and 5, while this one focuses on XHTML vs HTML in what I hope is complete, yet as concise as possible. Also some of the answers to that one are outdated and would be better off deleted. – Mr Lister May 23 '19 at 06:35
  • I'll be able to update some of the answers there with actual examples though, particularly the bit that says _"Some scripts which have not been prepared properly may work differently or fail in an XHTML environment (..) please expand"_ – Mr Lister May 23 '19 at 06:37

1 Answers1

4

What is the difference between HTML and XHTML?

There are many differences. The main one is that XHTML is HTML in an XML document, and XML has different syntax rules:

  • XML has a different namespace by default, so you'll have to use the HTML namespace, xmlns="http://www.w3.org/1999/xhtml" explicitly in an XHTML document
  • XML is case sensitive and you'll have to use lowercase for tag names and attributes and even the x in hexadecimal character references
  • XML doesn't have optional start and end tags, so you'll have to write out all of them in full
  • Likewise, XML doesn't have void tags, so you'll have to close every void element yourself with a slash.
  • Non-void elements that have no content can be written as a single empty element tag in XML.
  • XML can contain CDATA sections, sections of plain text delimited with <![CDATA[ .. ]]>; HTML cannot
  • On the other hand, there are no CDATA or PCDATA elements or attributes in XML, so you'll have to escape your < signs everywhere (except in CDATA sections)
  • Quotes around attribute values are not optional in XML, and there is no attribute minimization (name-only attributes)
  • And the XML parser is not as forgiving of errors as the HTML parser.

Then there are a couple of not XML-related differences:

  • XHTML documents are always rendered in standards mode, never in quirks mode
  • XHTML does not look at meta commands in the head to determine the encoding. In fact, the W3C validator flags <meta http-equiv="content-type" ... as an error in XHTML5 files, but not in HTML5 files.
  • Earlier on, mismatches between the dtds for XHTML 1.0 strict and HTML 4.01 strict lead to validation issues. The definition for XTHML 1.0 was missing the name attribute on <img> and <form>. This was an error though, fixed in XHTML 1.1.

Note that XHTML documents should be served up with the correct file type, i.e. a .xhtml file extension or an application/xhtml+xml MIME type. You can't really have XHTML in an HTML document, because browsers don't differentiate between the two syntaxes by looking at the content, only by file type.
In other words, if you have an HTML file, its contents are HTML, no matter if it has valid XML in it or not.

One point about the syntax rules worth mentioning is the casing of tag names. Although HTML documents are case-insensitive, the tag names are actually exposed as uppercase by the DOM. That means that under HTML, a JavaScript command like console.log(document.body.tagName); would output "BODY", whereas the same command under XHTML would output "body".

Isn't XHTML merely a stricter version of HTML?

No; XML has different rules than HTML, but it's not necessarily stricter. If anything, XML has fewer rules!

In HTML, many features are optional. You can choose to put quotes around attribute values or not; in XML you don't have that choice. And in HTML, you have to remember when you have the choice and when you don't: are quotes optional in <a href=http://my-website.com/?login=true>? In XML, you don't have to think about that. XML is easier.

In HTML, some elements are defined as raw text elements, that is, elements that contain plain text rather than markup.
And some other elements are escapable raw text elements, in which references like &#233; will be parsed, but things like <b>bold</b> and <!-- comment --> will be treated as plain text. If you can remember which elements those are, you don't have to escape < signs (you optionally can though). XML doesn't have that, so there's nothing to remember and all elements have the same content type.

XML has processor instructions, the most well known of which is the xml declaration in the prolog, <?xml version="1.0" encoding="windows-1252"?>. This tells the browser which version of XML is used (1.0 is the only version that works, by the way) and which character set.

And XML parses comments in a different way. For example, HTML comments can't start with <!--> (with a > as the first character inside); XHTML comments can.
Speaking of comments, with XHTML you can comment out blocks of code inside <script> and <style> elements using <!-- comment -->. Don't try that in HTML. (It's not recommended in XHTML either, because of compatibility issues, but you can.)

Why are there different versions of XHTML if they all act the same?

They don't! For instance, in XHTML 1.1 you can refer to character entities like &eacute; and &nbsp;, because those entities are defined in the DTD. The current version of XHTML (formerly known as XHTML5) does not have a DTD, so you will have to use numerical references, in this case &#233; and &#160; (or, define those entities yourself in the DOCTYPE declaration. The X means eXtensible after all).

Mr Lister
  • 45,515
  • 15
  • 108
  • 150
  • "1.0 is the only version there is, by the way" - it isn't: https://www.w3.org/TR/2006/REC-xml11-20060816/ – Quentin May 22 '19 at 18:27
  • @Quentin But have you actually tried putting `version="1.1"` in the XML prolog on an XHTML file? Edit: oh, I see it works in Chrome now. Still not in Firefox or Edge though. – Mr Lister May 22 '19 at 18:29
  • @Alohci You changed "boolean" into "name-only", but AFAIK "boolean" is still the formal name for this type of attribute. So why change it? Other terms may need cleanup - I did notice that WHATWG seems to prefer "raw text elements" to "CDATA elements" these days, but "boolean" is still OK. – Mr Lister May 24 '19 at 19:30
  • @MrLister - "boolean" may be the commonly used term, I don't know, but it's not the term the HTML5 spec uses. In fact, it says "[An attribute value is a string](https://w3c.github.io/html/dom.html#element-definitions-attributes)". I assume that what you were referring to is what the HTML5 spec calls the [Empty attribute syntax](https://w3c.github.io/html/syntax.html#ref-for-attribute-names%E2%91%A0) which is described as "Just the attribute name", so "name-only" seems appropriate. – Alohci May 24 '19 at 20:04
  • HTML5 also does talk about ["boolean attributes"](https://w3c.github.io/html/infrastructure.html#sec-boolean-attributes). But it says "If the attribute is present, its value must either be the empty string or a value that is an ASCII case-insensitive match for the attribute’s canonical name, with no leading or trailing white space." and "A boolean attribute without a value assigned to it (e.g. checked) is implicitly equivalent to one that has the empty string assigned to it (i.e. checked=""). As a consequence, it represents the true value". So both syntaxes support boolean attributes. – Alohci May 24 '19 at 21:38