0

I created a word counting function and found a discrepancy. It produced different results counting the text words in html depending on if the element the html is enclosed in is part of the document.body or not. For example:

html = "<div>Line1</div><div>Line2<br></div>";

document.body.insertAdjacentHTML("afterend", '<div id="node1"></div>');
node1 = document.getElementById("node1");
node1.style.whiteSpace = 'pre-wrap';
node1.innerHTML = html;

node2 = document.createElement('div');
node2.style.whiteSpace = 'pre-wrap';
node2.innerHTML = html;

The white-space: pre-wrap style is applied so that the code in the html variable is rendered, in terms of line-breaks, consistently across browsers. In the above:

node1.innerText     // is "Line1\nLine2\n" which counts as two words.
node2.innerText     // is "Line1Line2" which counts as only one word.

My word count function is:

function countWords(s) {
    s = (s+' ').replace(/^\s+/g, '');   // remove leading whitespace only
    s = s.replace(/\s/g, ' ');          // change all whitespace to spaces
    s = s.replace(/[ ]{2,}/gi,' ')+' '; // change 2 or more spaces to 1
    return s.split(' ').filter(String).length;
}

If I then did something like this in the Web Console: node1.after(node2);

node2.innerText     // is changed to "Line1\nLine2\n" which counts as two words.

My questions are:

  1. Why is the white-space: pre-wrap style not being applied to node 2.innerText before it is inserted into the document.body?

  2. If node 2 has to be a part of document.body in order to get a white-space: pre-wrap style node 2.innerText value, how do I do that without having to make node 2 visible?

  3. I'm curious. When I crate a node element with createElement, where does that node element reside? It doesn't appear to be viewable in a Web Console Inspector inside or outside of the <html> tag and I can't find it in the document object.

This tipped me off that the discrepancy was something to do with if the node element being in the document.body or not: javascript createElement(), style problem.

SKisby
  • 35
  • 7
  • 4
    When you create an element using `createElement` it isn’t inserted into the DOM yet. That’s why you can’t find the element when inspecting the DOM. – Terry Jan 30 '22 at 01:33
  • @Terry, I know that. My curiosity question is where is the node element if it's not "inserted into the DOM"? – SKisby Jan 30 '22 at 02:16

2 Answers2

0

Indeed, when the element is attached to the DOM, Element.innerText takes the rendered value into account - you can say, the visible output. For non-attached elements, there is no rendering. The CSS properties exist but are not executed.

If you want consistent results between attached and non-attached elements, use Element.textContent.

For more information, see https://developer.mozilla.org/en-US/docs/Web/API/HTMLElement/innerText

Noam
  • 1,317
  • 5
  • 16
  • Thank you for your answer. Element.textContent is not used because it does not show line-breaks and I need to be able to see those line-breaks so that the words in the originating Element.innerHTML (that is separated only by line-breaks) can be properly counted. – SKisby Jan 30 '22 at 02:21
  • The linebreaks don't exist in the content though, they are only added by the CSS rendering of the <div> elements. (I actually don't think the whitespace is relevant, it's the <div> rendered as display: block;) An alternative would be to go through the child elements and add linebreaks to your string accordingly, but if you want to work with the CSS rendering you can hiddenly add it to the document. Something like this: node.style.position = 'fixed'; node.style.top = '-1000px'; document.append(node); str = node.innerText; node.remove(); – Noam Jan 30 '22 at 02:44
  • My workaround is something similar: insert a div element after the `

    ` tag (so when unseen, it will not mess up any visible layouts). Keep that div element visible (so that its `innerText` will render), but make sure that element's backgroud is none, boarderColor is transparent, and make its (text) color transparent.

    – SKisby Jan 30 '22 at 05:08
  • 1
    I wouldn't trust adding elements _after_ the body element to render properly, you should add it _at the end_ of the body element (and my code above is mistaken, it should be `document.body.append()` not `document.append()` ).`position:fixed;` should prevent it from affecting any layout and then `top: -1000px;` reasonably hides it, but yeah you can probably use transparency too. – Noam Jan 30 '22 at 05:24
0

In follow-up to my question above, I needed to count the words in html text strings like this: <div>Line1</div><div>Line2<br></div> where the word count matched what it would be if that html was rendered in the displayed DOM

To summarize what others have said, when you create an element using createElement it isn’t inserted into the DOM yet and can’t be found when inspecting the DOM. Before the element is inserted into the DOM, the CSS properties exist but are not executed, so there is no rendering. When the element is inserted into the DOM, the CSS properties are executed, and the element is rendered according to the CSS.

Here's the html-string-to-rendered-html-text function I ended up using. This function strips the html tags but retains the "white space" so that the words can then be counted (with consistency across browsers, including IE 11).

var html = "<div>Line1</div><div>Line2<br></div>";

// Display the html string
var htmlts = document.getElementById("htmlts");
htmlts.innerText = html;

// Display a DOM render of the html string
var node1 = document.getElementById("node1");
node1.style.whiteSpace = 'pre-wrap';
node1.innerHTML = html;

// Display the innerText of the above DOM render 
var node1ts = document.getElementById("node1ts");
node1ts.innerText = node1.innerText;

// Display the results of the htmlToText function 
var node2ts = document.getElementById("node2ts");
node2ts.innerText = htmlToText(html);

// Adapted from https://stackoverflow.com/a/39157530
function htmlToText(html) {
    var temp = document.createElement('div');
    temp.style.whiteSpace = 'pre-wrap';
    temp.style.position = "fixed";            // Overlays the normal flow
    temp.style.left = "0";                    // Placed flush left
    temp.style.top = "0";                     // Placed at the top
    temp.style.zIndex = "-999";               // Placed under other elements
    // opacity = "0" works for the entire temp element, even in IE 11.
    temp.style.opacity = "0";                 // Everything transparent

    temp.innerHTML = html;                        // Render the html string
    document.body.parentNode.appendChild(temp);   // Places just before </html>
    var out = temp.innerText;
//  temp.remove();                            // Throws an error in IE 11
    // Solution from https://stackoverflow.com/a/27710003
    temp.parentNode.removeChild(temp);            // Removes the temp element
    return out;
}
<html lang="en-US">
<body>
HTML String: <code id="htmlts"></code><br><br>
Visible Render of HTML String (for comparison): <div id="node1"></div><br>
Visible Render Text String: <code id="node1ts"></code><br>
Function Returned Text String: <Code id="node2ts"></code><br>
</body>
</html>

If you prefer to have the temporary element insert inside the body element, change document.body.parentNode.appendChild to document.body.appendChild.

As Noam had suggested, you can also use temp.style.top = "-1000px";.

To answer my curiosity question: before the element is "inserted into the DOM" it appears to be in a Shadow DOM or Shadow Dom-like space.

SKisby
  • 35
  • 7