34

Even with HTML5 being the path forward for HTML we get two options as developers: XHTML syntax and HTML syntax. I've been using XHTML as my main doctype for 5 or so years so I'm very comfortable with it.

But my question is given that non-xml syntax will be allowed, is there any reason to stick with a valid XML syntax? Do you gain anything going with one over another, besides preference (compatibility, etc)? Personally I'll feel a little dirty going back to not closing tags,
is second nature to me now, but would I gain something going back to HTML syntax?

Update: I guess my true question is is there a reason to switch from XHTML to HTML syntax? I've been using XHTML for years and not sure if there is a reason to switch back. Browser compatibility (IE was sometimes finiky with the application/xhtml+xml mime-type), etc?

Parrots
  • 26,658
  • 14
  • 59
  • 78
  • 1
    nothing stopping you from closing all tags with regular HTML... – Evan Teran Jul 02 '09 at 22:10
  • 5
    @Evan - yes there is; it isn't valid! in particular, things like
    – Marc Gravell Jul 02 '09 at 22:12
  • I wonder how something like
    would fly in HTML syntax
    –  Jul 03 '09 at 02:48
  • 2
    HTML5 gives you the *option* of using self-closing tags. It would not be valid HTML4 though. – Alex Barrett Jul 03 '09 at 03:22
  • It would fly in HTML, its just that
    means the same as
    gt; (even if virtually no browser supports it). The HTML 4 spec says you should avoid that feature (since support sucks)
    – Quentin Jul 03 '09 at 09:10
  • "IE was sometimes finiky with the application/xhtml+xml mime-type" meaning "It asked the user where they wanted to save it unless they installed an obscure plug-in"? :) – Quentin Jul 03 '09 at 09:11
  • 4
    @DavidDorward No, that isn't true. In HTML4 it was *technically* true, as HTML4 was still an SGML-based language. But, as you said, nobody actually followed that. HTML5 is *not* SGML-based, and it doesn't have such a silly rule.
    is exactly the same as
    in the HTML syntax of HTML5.
    – Xanthir Nov 19 '09 at 17:30
  • 1
    @Parrots: See http://mathiasbynens.be/notes/xhtml5 – Mathias Bynens Sep 16 '10 at 11:27

10 Answers10

24

The advantage of XHTML syntax is that it is XML. It can be easily parsed, understood and manipulated. The HTML syntax is a lot harder for clients to work with.

Nonsense! The HTML5 spec defines how to parse HTML in a way that is relatively easy to implement, and off-the-shelf parsers are being developed that can be easily integrated into tool chains. It's even possible for an HTML5 parser to be integrated into an XML tool chain in place of an XML parser.

But what you need to understand is that in practice, you're most likely using HTML anyway, even if you think you're using XHTML based on the DOCTYPE. If your content is being served as text/html, instead of application/xhtml+xml or another XML MIME type, then your content will be processed as HTML.

With HTML5, you can choose to use HTML-only syntax, meaning that it is only compatible with being served and processed as text/html it is not well-formed XML. Or use XHTML-only syntax, meaning that is is well-formed XML, but uses XML features that are not compatible with HTML. Or, you can write a Polyglot document, which is conforming and compatible with both HTML and XHTML processing (In principle, this is conceptually similar to writing XHTML 1.0 that conforms with Appendix C guidelines).

Lachlan Hunt
  • 2,770
  • 1
  • 17
  • 7
  • Indeed. This is what all browsers have been doing for their whole lives. – Mehrdad Afshari Jul 02 '09 at 23:23
  • 3
    Lachlan, it is not easy to implement and you know like me that the number of html 5 parsers are still very few compared to XML parsers. – karlcow Jul 03 '09 at 02:26
  • 2
    @Lachlan, you know very well that HTML 5 is still a draft and subject to change. As I understand it, none of the browsers available to the general public today implement the HTML5 parser spec in full, let alone other user agents. On the other hand, XML parsers are ubiquitous. Maybe one day, html5 parsers will be as convenient to use as xml ones, but not yet. Maybe one day, IE will implement application/xhtml+xml and web authors can, if they wish, leave text/html behind. In the meantime, if one wishes, as I do, to parse back ones own web pages, using a polyglot document is the way to go. – Alohci Jul 03 '09 at 08:05
  • karlcow, I said *relatively* easy to implement, and given that html5lib was implemented by a group of people with little to no experience implementing a parser beforehand simply by following the spec, I think my claim is valid. Alohci, yes, I am aware of the instability of HTML5 due to its WD status. But I was addressing the bogus claim that parsing HTML is a lot harder than parsing XML. It's not really relevant that browsers haven't yet finished migrating to fully conforming HTML5 parsers, as their existing parsers handle real world HTML sufficiently in practice anyway. – Lachlan Hunt Jul 03 '09 at 15:15
  • 3
    @Lachlan Hunt: Parsing a less strict syntax is generally more difficult, when comparing to a more formalized syntax. Pragmatically speaking, the two are so close that it is *almost* the same difficulty. What I don't understand is why they didn't prefer XHTML syntax regardless. HTML5 feels like step backward, just calling it as I see it. The new technologies, love `em... The old syntax?? That needed to stay dead. – J. M. Becker Feb 22 '12 at 02:01
18

I guess my true question is is there a reason to switch from XHTML to HTML syntax? I've been using XHTML for years and not sure if there is a reason to switch back. Browser compatibility (IE was sometimes finiky with the application/xhtml+xml mime-type), etc?

As mentioned in a previous answer, text/html is gets parsed as HTML and application/xhtml+xml gets parsed as XML. Thus, you should use the syntax that matches the MIME type you use.

If you are now serving text/html but using XHTML syntax, then you should fix your content to use the HTML5 syntax. You may already be close, since HTML5 allows the XMLesque /> empty element syntax for void elements (elements that are always empty, such as img and br).

If you are now using application/xhtml+xml, IE support would be a reason to switch to text/html and the HTML syntax if you care about supporting IE.

Trying to write polyglot documents that are correct HTML5 and XHTML5 (for serving different MIME types do different browsers with the same payload bytes) is harder than it seems at first sight and not worth the trouble.

hsivonen
  • 7,908
  • 1
  • 30
  • 35
8

The HTML5 draft is very clear about which syntax to use:

  • use HTML syntax when sending pages as text/html
  • use XHTML syntax when sending pages as application/xhtml+xml

Reference: http://dev.w3.org/html5/spec/Overview.html#authors-using-xhtml

Ionuț G. Stan
  • 176,118
  • 18
  • 189
  • 202
  • true, but it doesn't really answer the question of which should be preferred when you have the option of using either content type. – jalf Jul 02 '09 at 23:40
  • it does, use HTML when is text/html and XHTML when is application/xhtml+xml. While you can use XHTML with text/html, that is not recommended, and the other way, HTML with application/xhtml+xml is not possible. – Ionuț G. Stan Jul 02 '09 at 23:47
  • 2
    Sorry but it doesn't really answer my question. I get that the mine-type is what tells the browser what syntax to use -- I was asking which to use myself. I can set the mime-type to be whatever I want, so I know *how* to switch between the two. – Parrots Jul 03 '09 at 00:12
  • 1
    @Parrots, but you know that IE does not support application/xhtml+xml, right? So I doubt you can use whatever mime type you want, except a few cases. – Ionuț G. Stan Jul 03 '09 at 00:26
  • @IonuțG.Stan: From [caniuse](http://caniuse.com/#search=application%2Fxhtml), IE9 and later *do* support `application/xhtml+xml`. – DavidRR May 01 '15 at 17:41
  • @DavidRR it was 2009 when I answered this. Personally, I'm still using text/html and HTML syntax. HTML is fault-tolerant, XHTML is not. – Ionuț G. Stan May 02 '15 at 12:06
  • @IonuțG.Stan: A lot has certainly changed in these past six years. And sure, HTML is fault-tolerant. But [Jeff Atwood argues](http://blog.codinghorror.com/html-validation-does-it-matter/) that there is value in knowing ***why*** your HTML is not valid. (Yes, I realize that Jeff's article also dates back to 2009, but I think the content is still relevant.) – DavidRR May 03 '15 at 22:33
2

When using XHTML you can mix it with other XML content, f.e. MathML, SVG or your own proprietary format, by just changing namespace at some point. Also, you can embed XHTML inside other XML documents.

(well, actually MathML and SVG can be used in non-XML HTML5 too, but they are special-cased)

liori
  • 40,917
  • 13
  • 78
  • 105
  • "When using XHTML you can mix it with other XML content, f.e. MathML, SVG or your own proprietary format, by just changing namespace at some point." <- except for IE. – Ionuț G. Stan Jul 02 '09 at 23:27
  • 1
    IE doesn't support HTML5 in the first place though, does it? – jalf Jul 02 '09 at 23:39
  • I always had the freedom not to call IE a web browser. HTML5 was designed for compatibility, so at least some parts of a web page will work. – liori Jul 02 '09 at 23:49
  • @jalf, it does. Well, depends what you mean by support. HTML5 is designed to be backwards compatible. It follows the principle of graceful degradation. – Ionuț G. Stan Jul 02 '09 at 23:52
1

You shouldn't use XHTML to serve content on the Web (or any network including Internet Explorer clients); see Sending XHTML as text/html Considered Harmful for the full rationale.

Thomas Broyer
  • 64,353
  • 7
  • 91
  • 164
  • That's not true, there are cases where XHTML should be served on the web as application/xhtml+xml, when you specifically want/need to use some of the benefits of XHTML (see further down the article for examples). Usually, though, you will be better off serving HTML as text/html. – Alistair Knock Jul 03 '09 at 15:23
1

I like XHTML, because it forces me to write a good page. There are many advantages to XHTML, because browsers parse it faster, and you need to make well formed XML rather than just HTML. Also, you need to serve a page with the MIME Type application/xhtml+xml or you don't get any of the advantages of the X. The only problem with XHTML is that it won't display in IE8 and earlier.

Orcris
  • 3,135
  • 6
  • 24
  • 24
1

Most of the benefits of XHTML have failed to materialise. While I wouldn't recommend it for new projects, XHTML served as text/html seems to be quite manageable and widespread, as long as you follow the compatibility guidelines. It probably isn't worthwhile changing any significant projects back to the HTML serialisation.

Community
  • 1
  • 1
Casebash
  • 114,675
  • 90
  • 247
  • 350
0

The advantage of XHTML syntax is that it is XML. It can be easily parsed, understood and manipulated. The HTML syntax is a lot harder for clients to work with.

But ultimately, it is just a matter of syntax. Both forms are allowed for HTML5.

jalf
  • 243,077
  • 51
  • 345
  • 550
  • 1
    That's not true. XML is not easier to parse than HTML 4.01 Strict, provided that both are valid. The reason behind the self-closing tags in XML is that its a framework for defining markup languages, so one doesn't have to know before hand which are the self-closing tags. On the other hand, browsers already know what are these tags, so they know very well that after a
    they should not expect a . That's all.
    – Ionuț G. Stan Jul 02 '09 at 23:32
  • 3
    XPath or XSLT are two ready-made technologies for parsing and manipulating XML. They don't work with HTML. HTML allows more than just unclosed tags, it also allows you to close tags in a different order than they were opened. So no, that's not "all". :) – jalf Jul 02 '09 at 23:38
  • 1
    HTML 4.01 Strict, does not allow you to close tags in a different order. Just that some people did it does not mean it is allowed. The only thing hard about HTML is that it does not enforce draconian rules in the markup. XHTML is either correct or not. So, HTML, as defined in the standards is OK. What we have in the real world is not OK. – Ionuț G. Stan Jul 02 '09 at 23:43
  • 2
    Furthermore, because people think they what they write is XHTML, which in fact is invalid HTML, they believe XHTML is easy. But there are thousands, or ten of thousands of invalid XHTML/HTML pages out there with XHTML transitional doctypes. That because IE does not support XHTML so they had to send they markup as text/html. So, no XHTML/XML advantages. – Ionuț G. Stan Jul 02 '09 at 23:48
  • 2
    @jalf: I personally use XPath and XSLT with HTML. These technologies are independent of XML. They work on DOM, and both HTML and XML produce equivalent DOM. HTML 5 does not allow tags to be closed in wrong order (it's parse error. HTML 5 never breaks tree structure). – Kornel Jul 04 '09 at 21:52
  • @ Ionuț G. Stan: Believe me the advantages still exist, even If they don't exist on the browser. You can do really cool transforms server side, and use XML tools for anything you want. – J. M. Becker Feb 22 '12 at 02:07
0

Update: I guess my true question is is there a reason to switch from XHTML to HTML syntax? I've been using XHTML for years and not sure if there is a reason to switch back. Browser compatibility (IE was sometimes finiky with the application/xhtml+xml mime-type), etc?

You have to really consider two things. The language you are writing and the language you are sending. The Web is defined by 3 components:

  • URI
  • A resource - Markup Language (document)
  • A protocol - HTTP (tool for managing information space)

You can write a document with an XML syntax on your desktop such as using XHTML. In this specific environment, if you give the extension ".xhtml" to the filename and open it with your local browser, it will be parsed as XML. If you give the extension ".html" to the filename, it will be parsed as HTML. Basically in your authoring tool, it is XML, but this doesn't matter anymore once you process it with a tool.

On the Web, your ressource identified by a URI will be sent with a specific mimetype, most of the time, these days, people are using text/html. The mimetype defines how the client (browser, search engine bot, etc.) must process your document. If you are using an XML syntax but send it with text/html, the document will be processed by an html parser.

For sending your documents over the wire as XML, you have to configure your server to send it as application/xhtml+xml. (Note: that IE8 and previous versions do not understand what is application/xhtml+xml and they will propose the save menu.)

The HTML 5 Abstract model has been designed in a way that you can almost write it with an html syntax or an xml syntax in text/html. Almost because even if you write with an XML syntax (closing empty elements, quotes around attributes, etc.) you will get into troubles for complex pages which are calling scripting and namespaces, due to the way XML parsers and HTML parsers deal with those.

karlcow
  • 6,977
  • 4
  • 38
  • 72
0

2019 UPDATE

W3 own words about XHTML:

"A newer specification exists that is recommended for new adoption in place of this specification. New implementations should follow the latest version of the HTML specification."

So, you should use HTML 5.*

Juanma Menendez
  • 17,253
  • 7
  • 59
  • 56