The Document.createTreeWalker constructor can include a NodeFilter object with an filter function that tests each node selected by the whatToShow parameter.
For nodes that pass the test, the filter function should return the value NodeFilter.FILTER_ACCEPT.
When testing the nodes, you can use the matches(selectorList) method from the DOM Element API with a list of selectors that you don't want to match. Either use a simple list and negate the result (as in the example), or use the :not(selectorList) pseudo–class.
The following also filters out empty text nodes and those with only whitespace, since many browsers will insert empty text nodes where the HTML source has any whitespace between element tags (e.g. <p></p>
may have from zero to three empty text nodes when parsed depending on surrounding code). It also pushes the actual text into the array rather than the text node objects.
function findTextNodes() {
var walker = document.createTreeWalker(
document.body, // root
NodeFilter.SHOW_TEXT, // nodes to include
{acceptNode: filter} // NodeFilter object
);
var textNodes = [];
while (walker.nextNode()) {
textNodes.push(walker.currentNode.textContent);
}
return textNodes;
}
// NodeFilter function
function filter(node) {
// Ignore any node that matches a selector in the list
// and nodes that are empty or only whitespace
if (!node.parentNode.matches('a, a *, script, textarea, .to-ignore') &&
!/^\s*$/.test(node.textContent)
) {
// If passes test, return accept value
return NodeFilter.FILTER_ACCEPT;
}
}
let textNodes = findTextNodes();
console.log(textNodes.join('\n'));
.to-ignore {
background-color: yellow;
}
a * {
color: green;
}
<p>It's the end of the world as we know it,<br>
and I feel fine</p>
<a>the in a</a>
<br>
<a><span>the in span in a</span></a>
<span class="to-ignore">in to-ignore</span>
Nodes that will be ignored by the filter function are those that match any of the following selectors:
- a - A elements
- a * - all descendants of A elements
- script - script elements
- textarea - textarea elements
- .to-ignore - elements with class "to-ignore"