1

Given a fetched html page, I want to find the specific node that contains a portion of text. The hard way I guess it would be to iterate to all the nodes one by one, getting as deep as it goes, and for each case do a search with e.g. .includes()

But how is the wise way? There must be something, but I'm unable to google correctly for it

    response = axios.get(url);
    let parsedHtml = parser.parseFromString(response.data, 'text/html');
    for (let i = 0; i < parsedHtml.children.length; i++)
       if (parsedHtml.children[i].textContent.includes('hello'))
          console.log(parsedHtml.children[i])

*it doesn't work

*Example code

<html>
 <body>
  <div>dfsdf</div>
  <div>
   <div>dfsdf</div>
   <div>dfsdf</div>
  </div>
  <div>
   <div>
    <div>hello</div>
   </div>
  </div>
  <div>dfsdf</div>
 </body>
 </html>

I would like to retrieve <div>hello</div> as a node element

GWorking
  • 4,011
  • 10
  • 49
  • 90
  • Related: [Finding the DOM element with specific text and modify it](https://stackoverflow.com/questions/6132074/finding-the-dom-element-with-specific-text-and-modify-it). – CRice Oct 04 '18 at 19:21
  • Thanks! you link has driven me to other links that finally ended up with this solution I've just posted :) – GWorking Oct 04 '18 at 20:31

1 Answers1

1

After getting almost convinced that I had to traverse the DOM the classical way, I've found this here Javascript: How to loop through ALL DOM elements on a page? which is indeed excellent:

    let nodeIterator = document.createNodeIterator(
        parsedHtml,
        NodeFilter.SHOW_ELEMENT,
        (node) => {
            return (node.textContent.includes('mytext1')
                || node.textContent.includes('mytext2'))
                && node.nodeName.toLowerCase() !== 'script' // not interested in the script
                && node.children.length === 0 // this is the last node
                ? NodeFilter.FILTER_ACCEPT : NodeFilter.FILTER_REJECT;
        }
    );
    let pars = [];
    let currentNode;

    while (currentNode = nodeIterator.nextNode())
        pars.push(currentNode);
    console.log(pars[0].textContent); // for example
GWorking
  • 4,011
  • 10
  • 49
  • 90