Using a TreeWalker to retrieve non-Javascript text nodes

Question

This question teaches how to get all TextNodes inside the document, and this is getting me the Javascript texts as well. What is the best way to filter out all the Nodes that are Javascript code?

score 13 · Accepted Answer · edited Jun 30 '17 at 12:24

Text inside <script> tags has only one thing in common: their parent is a <script> element.

if (node.parentNode.nodeName !== 'SCRIPT')

Another approach is to use the filter:

var rejectScriptTextFilter = {
  acceptNode: function(node) {
    if (node.parentNode.nodeName !== 'SCRIPT') {
      return NodeFilter.FILTER_ACCEPT;
    }
  }
};

var walker = document.createTreeWalker(
  document.body, 
  NodeFilter.SHOW_TEXT, 
  rejectScriptTextFilter,
  false
);

var node;
var textNodes = [];

while(node = walker.nextNode()) {
  textNodes.push(node.nodeValue);
}

console.log(textNodes);

<script> var str = "script here"; </script>
<p> text here </p>

score 0 · Answer 2 · answered May 12 '16 at 05:41

0

You could clone the original document, remove <script> elements at cloned document, then iterate remaining nodes of cloned document

answered May 12 '16 at 05:41

guest271314

1
15
104
177

Using a TreeWalker to retrieve non-Javascript text nodes

2 Answers2

Linked