Why can browsers infer certain omitted HTML elements, but not all omitted elements required to form valid markup?

Question

Consider the following invalid HTML, where <tr> is a direct child of <div>:

console.log(document.getElementsByTagName('tr').length);

<div>
  <tr></tr>
</div>

The <tr> element does not get added to the DOM.

Now consider the equally-invalid HTML, where <tr> is a direct child of <table>:

console.log(document.getElementsByTagName('tr').length);

<div>
  <table>
    <tr></tr>
  </table>
</div>

This time, the <tr> element does get added to the DOM.

Note that I have deliberately omitted the <tbody> in the second snippet, which is required to form valid markup. If omitted, <tbody> is automatically added by the browser, as is noted in this question.

This answer mentions the official W3 documentation's fairly extensive list on what tags are optional, but why are these particular tags optional? Considering the browser is smart enough to automatically add the <tbody> element that I have omitted, why is it not smart enough to add the <table> element as well? There is no possible ambiguity, as <table> is the only valid parent for <tbody>.

Why can <tbody> be inferred from <tr>, but not <table>? Can only one level of DOM hierarchy be inferred?

It is not invalid to have `` as a direct child of `` https://html.spec.whatwg.org/multipage/tables.html#the-tr-element — Kaiido, Oct 20 '17 at 01:57
_"which is required to form valid markup "_ - no, it is not. You are linking back into the year 1998 here. Get with the ... current millennium, so to speak. https://www.w3.org/TR/html5/tabular-data.html#the-tbody-element: _"A tbody element's start tag may be omitted if the first thing inside the tbody element is a tr element, and if the element is not immediately preceded by a tbody, thead, or tfoot element whose end tag has been omitted."_ This has nothing to do with "browser smartness", but with _it was deliberately specified this way_. — CBroe, Oct 20 '17 at 02:12
And the extensive list of optional tags you are referring is comprised of mostly _end_ tags that are optional. Now, determining correctly how or when to close the currently open element, is a lot easier than to determine what the _correct_ parent of an element would be - especially for an element like `tr`, that could have several different element types as parent. — CBroe, Oct 20 '17 at 02:16

score 3 · Accepted Answer · answered Oct 20 '17 at 01:57

Historically tables used to be created with a table and rows and no tbody or thead elements at all.

Even the reference you pointed to that said tbody is "required" does not in fact say that at all. The very next sentence says the start tag is optional if the first element after the table is a `tr.

Also see here:

https://www.w3.org/TR/html5/tabular-data.html#the-tbody-element

Which is official and states the same:

A tbody element's start tag may be omitted if the first thing inside the tbody element is a tr elemen

The optionality of tags to a large extent comes from the fact that html used to commonly be written a lot sloppier, with no concept of an empty tag like <br /> and very commonly not requiring closing tags such as with <li>.

There was an attempt after HTML 4 to create XHTML standard that was a lot stricter and didn't have most (or any?) of the optionality and enforced strict XML conformance for html. This never fully took off and html5 went fully the opposite direction codifying the fact that HTML is not necessarily XML.

Ahh, interesting. I thought by the documentation stating that the `` "*may be omitted*" meant that it may be omitted by the **coder**, and the browser would automatically add it in for you **if** omitted. Considering there is ambiguity in the relationship between `` and `` / ``, I'm surprised that `` may be omitted, but not `` itself (considering no ambiguity between `` and `
`). — Obsidian Age, Oct 20 '17 at 02:02
@ObsidianAge HTML tables were introduced in HTML 3.0. At the time the only elements allowed to exist below `TABLE` were `CAPTION` AND `TR`. `tbody` and `thead` were not introduced until later. — Samuel Neff, Oct 20 '17 at 02:13
@ObsidianAge html 3 here: https://www.w3.org/MarkUp/html3/tables.html and the html 2 spec is here http://www.ietf.org/rfc/rfc1866.txt where you can see no table element was defined yet. — Samuel Neff, Oct 20 '17 at 02:15

sideshowbarker · Answer 2 · 2017-10-20T02:09:05.563

In this specific case:

<div>
  <tr></tr>
</div>

…the reason that tr element doesn’t end up in the DOM is that the HTML parsing algorithm requires HTML parsers to ignore it completely.

The relevant part of the HTML spec for that case is the Tree construction section of the spec, and specifically in the The "in body" insertion mode subsection, which says:

↪ A start tag whose tag name is one of: "caption", "col", "colgroup", "frame", "head", "tbody", "td", "tfoot", "th", "thead", "tr"
Parse error. Ignore the token.

While in contrast, for this case:

<div>
  <table>
    <tr></tr>
  </table>
</div>

…the relevant part of the spec is The "in table" insertion mode subsection, which says:

↪ A start tag whose tag name is one of: "td", "th", "tr"
           Clear the stack back to a table context. (See below.)

           Insert an HTML element for a "tbody" start tag token with no attributes,
           then switch the insertion mode to "in table body".

           Reprocess the current token.

…and the The "in table body" insertion mode subsection says:

↪ A start tag whose tag name is "tr"
Clear the stack back to a table body context. (See below.)

Insert an HTML element for the token, then switch the insertion mode to "in row".

So in general for any question about why HTML parsers handle any given start tag or end tag in a particular context in a certain way, the answer is that there’s some subsection of the HTML parsing algorithm in the HTML spec that explicitly defines how parsers must handle that start tag or end tag for every particular context it might be found in.

Why can browsers infer certain omitted HTML elements, but not all omitted elements required to form valid markup?

2 Answers2