1

I'm trying to display a code snippet, which could be in HTML/XML-like languages. To escape HTML entities, I happen to find the following code working:

<!-- html code -->
<pre><code id="foo"></code></pre>
// js code
document.querySelector('#foo').textContent = '<p>a paragraph</p>';

innerText works as well, but innerHTML doesn't, which is expected.

I've tested it against latest Chrome and Firefox, but I'm not sure if the auto-escaping of textContent and innerText has good support in other major browsers. The relevant DOM and HTML API specs seem a bit vague on this.

So is this behavior required by the spec, thus a safe approach to escape strings?

wlnirvana
  • 1,811
  • 20
  • 36
  • When you use `innerHTML` the html tags will not be displayed, instead they are actually parsed ... for example `bar` will make **bar** and not "`bar`" – Cypherjac Jun 13 '20 at 13:51
  • 1
    The DOM spec seems pretty clear to me: the text is installed as the value of a new text node in the DOM. That means that no HTML interpretation is performed. The HTML special characters don't need to be escaped; that's only necessary if they need to make it past an HTML parser, which doesn't happen when you set `textContent`. – Pointy Jun 13 '20 at 13:51
  • @Pointy Thanks for the quick response. However I couldn't really find the sentence `the text is installed as the value of a new text node in the DOM` from the specs. But anyway, (maybe because I'm not familiar enough with HTML terminologies), even this sentence seems rather vague to me regarding HTML entities. – wlnirvana Jun 13 '20 at 14:00
  • 1
    A text node in the DOM is just that: a node with a string of characters. Once you get to that level, HTML is not relevant: any character can be part of that string. HTML entities are required in HTML source code because the code is going to be parsed and interpreted by an HTML parser. That does not happen when you are creating a text node with JavaScript. – Pointy Jun 13 '20 at 14:02
  • @Pointy I think I got the idea. Would you mind summarizing your comment into an answer so that I can accept it? – wlnirvana Jun 13 '20 at 14:33
  • OK answer typed in, hopefully it makes sense because I haven't had enough coffee this morning. – Pointy Jun 13 '20 at 14:41

1 Answers1

1

It's important to understand the difference between:

  • Feeding an HTML source file to the HTML parser in a browser in order to construct (and render) a DOM according to your wishes, and
  • Manipulating an existing DOM with JavaScript after a DOM has been constructed

When you set the textContent of an existing DOM node, you're using a browser-supplied API that will take whatever text you provide and create a new DOM node of type text with the given string of characters as its content. When you do that, HTML is not relevant at all: the HTML parser is not consulted. HTML entity notation is therefore not necessary, and in fact if you tried to use it you'd end up with the text node containing the literal HTML entity notation.

Of course, in HTML source code, you have to use HTML entity notation to encode special characters, but that's because you're feeding the content through the HTML parser. Once the parser is finished, the text nodes that exist in the DOM show no traces of those HTML entities: the parser interpreted them, created strings of characters, and created plain text nodes as per the wishes you expressed in the HTML source code.

Pointy
  • 405,095
  • 59
  • 585
  • 614