In HTML, block elements can't be children of inline elements. Browsers however are happy to accept this HTML:
<i>foo <h4>bar</h4> fizz</i>
and render it intuitively as expected; neither do they choke on it using DOMparser
.
But it's not valid and is therefore hard to convert to another schema. Pandoc parses the above as (option1):
<i>foo </i><h4>bar</h4> fizz
which is at least valid but not faithful. Another approach would be (option2):
<i>foo </i><h4><i>bar</i></h4><i> fizz</i>
Is there a way to force DOMparser
to do a more strict parsing that would result in option 1 or 2? (It doesn't seem possible).
Alternatively, what would be the best approach to deal with this, that is, given the first string, get option 1 or 2 as a result? Is there a JS parser that does this (and other strict enforcing of the standard)?
Edit: it turns out the HTML parser of at least Chrome (78.0.3904.108) behaves differently when the content is in a p
instead of, say, a div
. When the HTML above is in a p
then it gets parsed as option 2! But it's left as is when inside a div
.
So I guess the question is now: how to enforce the behavior of ps onto divs?