70

For this question I needed to find all text nodes under a particular node. I can do this like so:

function textNodesUnder(root){
  var textNodes = [];
  addTextNodes(root);
  [].forEach.call(root.querySelectorAll('*'),addTextNodes);
  return textNodes;

  function addTextNodes(el){
    textNodes = textNodes.concat(
      [].filter.call(el.childNodes,function(k){
        return k.nodeType==Node.TEXT_NODE;
      })
    );
  }
}

However, this seems inelegant in light of the fact that with XPath one could simply query for .//text() and be done with it.

What's the simplest way to get all text nodes under a particular element in an HTML document, that works on IE9+, Safari5+, Chrome19+, Firefox12+, Opera11+?

"Simplest" is defined loosely as "efficient and short, without golfing".

CertainPerformance
  • 356,069
  • 52
  • 309
  • 320
Phrogz
  • 296,393
  • 112
  • 651
  • 745

2 Answers2

194

Based on @kennebec's answer, a slightly tighter implementation of the same logic:

function textNodesUnder(node){
  var all = [];
  for (node=node.firstChild;node;node=node.nextSibling){
    if (node.nodeType==3) all.push(node);
    else all = all.concat(textNodesUnder(node));
  }
  return all;
}

However, far faster, tighter, and more elegant is using createTreeWalker so that the browser filters out everything but the text nodes for you:

function textNodesUnder(el){
  var n, a=[], walk=document.createTreeWalker(el,NodeFilter.SHOW_TEXT,null,false);
  while(n=walk.nextNode()) a.push(n);
  return a;
}
Phrogz
  • 296,393
  • 112
  • 651
  • 745
  • 13
    @julmot On my computer, looking for all text nodes on this page using Chrome v50, it takes 1900μs using the first technique, but 220μs using the TreeWalker technique. So, 8 or 9 times faster. – Phrogz Apr 19 '16 at 14:59
  • 3
    I had to tweek this in order to exclude the contents of ` – Web_Designer Sep 05 '16 at 00:50
  • 1
    If you're using the TreeWalker method and you want to exclude script or style tags as Web_Designer mentioned, you can pass a filter as the third argument to createTreeWalker – Vinay Pai Jul 26 '17 at 20:44
  • 1
    @VinayPai - Caveat: the `filter` is only run on [`a node that has passed the whatToShow check`](https://developer.mozilla.org/en-US/docs/Web/API/Document/createTreeWalker#Parameters), so in this case you couldn't use the convenient `NodeFilter.SHOW_TEXT`, but instead you'd have to add additional logic to manually filter text nodes by `nodeType` or something. – Sphinxxx Sep 11 '18 at 17:20
  • 2
    @Web_Designer - Alternative while still using `document.createTreeWalker()`: https://gist.github.com/Sphinxxxx/ed372d176c5c2c1fd9ea1d8d6801989b – Sphinxxx Sep 11 '18 at 18:09
  • If anyone is confused why there are four arguments: https://genius.engineering/psa-internet-explorer-requires-all-four-arguments-to-documentcreatetreewalker/ – thdoan Jan 17 '23 at 19:16
  • @Sphinxxx the filter acts as a post-processsor, so it takes into account the second argument, i.e., if you pass `NodeFilter.SHOW_TEXT` as the second argument then the filter would further filter those text nodes. – thdoan Jan 17 '23 at 19:38
6
function deepText(node){
    var A= [];
    if(node){
        node= node.firstChild;
        while(node!= null){
            if(node.nodeType== 3) A[A.length]=node;
            else A= A.concat(deepText(node));
            node= node.nextSibling;
        }
    }
    return A;
}
Phrogz
  • 296,393
  • 112
  • 651
  • 745
kennebec
  • 102,654
  • 32
  • 106
  • 127
  • 1
    How about `while (node)` without the `!= null`? – Phrogz May 24 '12 at 02:33
  • 3
    Or even `for (node=node.firstChild;node;node=node.nextSibling){ … }` – Phrogz May 24 '12 at 02:43
  • 1
    I was worried that the recursive solution might run into stack limit issues, but [I see now that this is unlikely](http://stackoverflow.com/questions/7826992/browser-javascript-stack-size-limit). – Phrogz May 24 '12 at 02:46
  • 1
    Once you know the first (parent) node is a child node the only possible values for node.nextSibling are another child node or null. – kennebec May 24 '12 at 03:55