32

I've heard that using el.innerText||el.textContent can yield unreliable results, and that's why I've always insisted on using the following function in the past:

function getText(node) {

    if (node.nodeType === 3) {
        return node.data;
    }

    var txt = '';

    if (node = node.firstChild) do {
        txt += getText(node);
    } while (node = node.nextSibling);

    return txt;

}

This function goes through all nodes within an element and gathers the text of all text nodes, and text within descendants:

E.g.

<div id="x">foo <em>foo...</em> foo</div>

Result:

getText(document.getElementById('x')); // => "foo foo... foo"

I'm quite sure there are issues with using innerText and textContent, but I've not been able to find a definitive list anywhere and I am starting to wonder if it's just hearsay.

Can anyone offer any information about the possibly lacking reliability of textContent/innerText?

EDIT: Found this great answer by Kangax -- 'innerText' works in IE, but not in Firefox

Community
  • 1
  • 1
James
  • 109,676
  • 31
  • 162
  • 175

1 Answers1

33

It's all about endlines and whitespace - browsers are very inconsistent in this regard, especially so in Internet Explorer. Doing the traversal is a sure-fire way to get identical results in all browsers.

John Resig
  • 35,521
  • 3
  • 29
  • 19
  • Thanks John. I'm surprised that there's pretty much no mention of this problem elsewhere on the net... even QuirksMode doesn't mention it. – James Apr 16 '10 at 15:29
  • 14
    Traversal doesn't guarantee the same results in all browsers. For example, in IE (unlike other browsers) the content of a ` – Tim Down Sep 14 '10 at 14:24