Does the spec require that setting innerText/textContent of node automatically escape HTML entities?

Question

I'm trying to display a code snippet, which could be in HTML/XML-like languages. To escape HTML entities, I happen to find the following code working:

<!-- html code -->
<pre><code id="foo"></code></pre>

// js code
document.querySelector('#foo').textContent = '<p>a paragraph</p>';

innerText works as well, but innerHTML doesn't, which is expected.

I've tested it against latest Chrome and Firefox, but I'm not sure if the auto-escaping of textContent and innerText has good support in other major browsers. The relevant DOM and HTML API specs seem a bit vague on this.

So is this behavior required by the spec, thus a safe approach to escape strings?

When you use `innerHTML` the html tags will not be displayed, instead they are actually parsed ... for example `bar` will make **bar** and not "`bar`" — Cypherjac, Jun 13 '20 at 13:51
The DOM spec seems pretty clear to me: the text is installed as the value of a new text node in the DOM. That means that no HTML interpretation is performed. The HTML special characters don't need to be escaped; that's only necessary if they need to make it past an HTML parser, which doesn't happen when you set `textContent`. — Pointy, Jun 13 '20 at 13:51
@Pointy Thanks for the quick response. However I couldn't really find the sentence `the text is installed as the value of a new text node in the DOM` from the specs. But anyway, (maybe because I'm not familiar enough with HTML terminologies), even this sentence seems rather vague to me regarding HTML entities. — wlnirvana, Jun 13 '20 at 14:00
A text node in the DOM is just that: a node with a string of characters. Once you get to that level, HTML is not relevant: any character can be part of that string. HTML entities are required in HTML source code because the code is going to be parsed and interpreted by an HTML parser. That does not happen when you are creating a text node with JavaScript. — Pointy, Jun 13 '20 at 14:02
@Pointy I think I got the idea. Would you mind summarizing your comment into an answer so that I can accept it? — wlnirvana, Jun 13 '20 at 14:33
OK answer typed in, hopefully it makes sense because I haven't had enough coffee this morning. — Pointy, Jun 13 '20 at 14:41

score 1 · Accepted Answer · answered Jun 13 '20 at 14:41

It's important to understand the difference between:

Feeding an HTML source file to the HTML parser in a browser in order to construct (and render) a DOM according to your wishes, and
Manipulating an existing DOM with JavaScript after a DOM has been constructed

When you set the textContent of an existing DOM node, you're using a browser-supplied API that will take whatever text you provide and create a new DOM node of type text with the given string of characters as its content. When you do that, HTML is not relevant at all: the HTML parser is not consulted. HTML entity notation is therefore not necessary, and in fact if you tried to use it you'd end up with the text node containing the literal HTML entity notation.

Of course, in HTML source code, you have to use HTML entity notation to encode special characters, but that's because you're feeding the content through the HTML parser. Once the parser is finished, the text nodes that exist in the DOM show no traces of those HTML entities: the parser interpreted them, created strings of characters, and created plain text nodes as per the wishes you expressed in the HTML source code.

Does the spec require that setting innerText/textContent of node automatically escape HTML entities?

1 Answers1

Linked