jQuery parses raw HTML with paragraph wrong

Question

parses wrong

The screenshot shows the firebug watch window.

Why does it parse the almost same HTMLs wrong? I expected there would be just one element in the second row, instead of an array of elements.

`
` is only allowed to contain inline elements, not blocks like `div`. jQuery isn't to blame. Build the same structure out of raw HTML, and inspect the results - it'll be the same. — Paul Roub, Aug 19 '13 at 21:15
Did you get the answer you were looking for? click to accept one answer in that case. Good question +1 — Rikard, Aug 20 '13 at 07:40

Sergio · Accepted Answer · 2013-08-20T06:11:28.627

9

The browser is not wrong. <p><div></div></p> is invalid HTML.

The reason why the browser analyses different the two codes is because <p> elements are only allowed to contain inline elements.

Both <p> and <div> are block elements but <p> cannot contain a <div> which is not phrasing content. So when the browser reads that code he finds the element <p> and then a unexpected <div>. Browsers are very tolerant to markup errors, so the browser closes the p tag and goes to the next div element. Then comes the third element, (also wrong HTML because it misses the opening tag) so it's read as a new element.

In the first case you have nested elements, so the browser shows one element;
In the second case you have three elements in the same DOM tree level, so an array of elements is the browsers answer.

They both render but the wrong one can produce unexpected results. How the browser will read wrong markup plus CSS will be difficult to predict.

So, the browser reads/parses the code as: <p></p><div></div><p></p>, giving you different results.

Worth to read:

W3 / HTML5 spec:
p – paragraph
div – generic flow container.

MOZILLA DEVELOPER NETWORK:
MDN: p element (check "Permitted content")
MDN: block-level elements

edited Aug 20 '13 at 06:11

answered Aug 19 '13 at 21:14

Sergio

28,539
11
85
132

Could you give me some reference on valid HTML? – kseen Aug 19 '13 at 21:15
@kseen a `
` tag is for text.
– iConnor Aug 19 '13 at 21:16
1

Side note: `
` is a block element, and also meant for having text inside. Having the `
` inside, which is also a block element, which is more than likely why the DOM is creating the elements like so.
– Mark Pieszak - Trilon.io Aug 19 '13 at 21:17
http://stackoverflow.com/questions/8397852/why-p-tag-cant-contain-div-tag-inside-it – Barbara Laird Aug 19 '13 at 21:17
2

“
is invalid HTML, that is why.” Although that is the source of why, it is not the why. – Kissaki Aug 19 '13 at 21:36
@kseen, just updated answer with some links and explanation. Hope it helps. – Sergio Aug 20 '13 at 05:40

Kissaki · Answer 2 · 2013-08-19T21:59:07.850

The result is not wrong in either case.

The <p> HTML tag may only contain phrasing content elements. However, <div> is not phrasing content (but a flow element). (Simplified <p> may contain inline elements, but <div> is a block element.) Thus, the HTML code from your second example is invalid (as in not standard conforming).

What happens as a result is that the browsers HTML to DOM parser - which is triggered by jQuery of course - handles the HTML as follows:

Identify <p> block being opened
Identify <div> block being opened
Notice a div block is invalid within the previously opened <p>
Close the previous <p> block
…

So an equivalent HTML code would be <p></p><div></div><p></p>, which is valid HTML. So the parser corrects the HTML for you.

Because we now have three top level elements rather than nested elements with one top level element your get an array of DOM elements rather than one element like you expected.

Webbrowsers are very robust against non-standard conformant HTML code. The behaviour you noticed and pointed out here is one of the many examples where the parser makes sense out of invalid HTML code as a best effort.

References:

jQuery parses raw HTML with paragraph wrong

2 Answers2

Worth to read: