0

I need to exclude some text node. For that I used

$('body :not(:has(*)):not(script):not(textarea):not(textarea *):not(a *):not(a)')

The function I use is:

function findAllTextNodes(n) {
  var walker = n.ownerDocument.createTreeWalker(n, NodeFilter.SHOW_TEXT);
  var textNodes = [];
  while (walker.nextNode())
    if (walker.currentNode.parentNode.tagName != 'SCRIPT' && 
  walker.currentNode.parentNode.tagName != 'A' && walker.currentNode.parentNode.className != 'to-ignore')
      textNodes.push(walker.currentNode);
  return textNodes;
}

Is there a nicer, more readable way to do it and how do I do the ':not(:has(*))' or ':not(a *)'?

Edit: Don't have the link to the original post, but here the link tojsfiddle I also don't want the 'the' inside <a> and <span> to be replaced.

Oliver
  • 79
  • 6
  • Does this answer your question? [getElementsByTagName() equivalent for textNodes](https://stackoverflow.com/questions/2579666/getelementsbytagname-equivalent-for-textnodes) – imvain2 Sep 18 '20 at 20:14
  • 1
    You can include an *acceptNode* filter function when creating the treeWalker, see [*MDN:Document.createTreeWalker()*](https://developer.mozilla.org/en-US/docs/Web/API/Document/createTreeWalker). I have no idea what the jQuery selector resolves to, can you please explain what it selects with example HTML? – RobG Sep 18 '20 at 20:28
  • @imvain2 sorry don't see anything helpful for me. I want to know how to write ':not(a *)' like 'walker.currentNode.parentNode.tagName != 'A'' but with the '*' and how to write ':not(:has(*))' like 'walker.currentNode.parentNode.tagName != '....' – Oliver Sep 18 '20 at 20:29
  • @RobG jQuery :not() Selector: This selector selects all elements except the specified element. So for example :not(a) does not select test from test but selects test . If you say :not(a *) then it also doesn't select the test inside the a tag. – Oliver Sep 18 '20 at 20:39
  • I understand :not, you need to explain what "body :not(:has(*)):not(script):not(textarea):not(textarea *):not(a *):not(a)" selects. – RobG Sep 18 '20 at 20:53
  • @RobG Well it returns me the body content but without the 'not' stuff. The ':not(:has(*))' only matches leaf elements. Otherwise you're matching at every level of the DOM hierarchy. The '':not(:has(*))' I need for my current solution, I don't know if I will need it for the above. – Oliver Sep 18 '20 at 21:16
  • @Oliver So are you looking for text nodes or for element nodes or for both? – Bergi Sep 18 '20 at 22:00
  • @Bergi I'm looking for text nodes, to replace with html. I'm finding every node I look for, the problem I'm finding to much. For example I don't want anything inside an a-tag. But I'm able to only exclude the a-tag, but not the elements inside it. But I don't know how to tell that to the treeWalker. – Oliver Sep 18 '20 at 22:24
  • It would be better if you explain what you want to get in plain English or pseudo code. It seems you want textNodes that are't descendants of A, textarea or script nodes or those with a class "to-ignore". Note that :not is not peculiar to jQuery, it's a standard selector (see [MDN](https://developer.mozilla.org/en-US/docs/Web/CSS/:not)). I don't see how the jQuery selector ignores the class. – RobG Sep 18 '20 at 23:37
  • @RobG here a link to jsfiddle http://jsfiddle.net/x2pq3b9y/3/ You see the 'the' inside and is replaced. I also don't want to be replaced. Or maybe ... the...> I also don't want to be replaced. Nothing inside ... should be replaced. Doesn't depend want content is inside . – Oliver Sep 19 '20 at 21:49

1 Answers1

0

The Document.createTreeWalker constructor can include a NodeFilter object with an filter function that tests each node selected by the whatToShow parameter.

For nodes that pass the test, the filter function should return the value NodeFilter.FILTER_ACCEPT.

When testing the nodes, you can use the matches(selectorList) method from the DOM Element API with a list of selectors that you don't want to match. Either use a simple list and negate the result (as in the example), or use the :not(selectorList) pseudo–class.

The following also filters out empty text nodes and those with only whitespace, since many browsers will insert empty text nodes where the HTML source has any whitespace between element tags (e.g. <p></p> may have from zero to three empty text nodes when parsed depending on surrounding code). It also pushes the actual text into the array rather than the text node objects.

function findTextNodes() {
  var walker = document.createTreeWalker(
                 document.body,         // root
                 NodeFilter.SHOW_TEXT,  // nodes to include
                 {acceptNode: filter}   // NodeFilter object
               );
  var textNodes = [];
  while (walker.nextNode()) {
      textNodes.push(walker.currentNode.textContent);
  }
  return textNodes;
}

// NodeFilter function
function filter(node) {
  // Ignore any node that matches a selector in the list
  // and nodes that are empty or only whitespace
  if (!node.parentNode.matches('a, a *, script, textarea, .to-ignore') &&
      !/^\s*$/.test(node.textContent)
     ) {
    // If passes test, return accept value
    return NodeFilter.FILTER_ACCEPT;
  }
}

let textNodes = findTextNodes();
console.log(textNodes.join('\n'));
.to-ignore {
  background-color: yellow;
}

a * {
  color: green;
}
<p>It's the end of the world as we know it,<br>
   and I feel fine</p>
<a>the in a</a>
<br>
<a><span>the in span in a</span></a>
<span class="to-ignore">in to-ignore</span>

Nodes that will be ignored by the filter function are those that match any of the following selectors:

  1. a - A elements
  2. a * - all descendants of A elements
  3. script - script elements
  4. textarea - textarea elements
  5. .to-ignore - elements with class "to-ignore"
RobG
  • 142,382
  • 31
  • 172
  • 209
  • Thank you very much RobG! So that's where the magic happens '!node.parentNode.matches('a, a *, script, textarea, .to-ignore')' in combination with {acceptNode: filter}. Now I understand how it works. Thank you very much. It's already a bit faster. How I can make it even faster :) – Oliver Sep 20 '20 at 19:43