1

I am making a currency converting script that will scrape all the text from any webpage and will find any foreign currency, convert it using an API and then replace the foreign currency with the new one. My question was, is there any way I can get the text of an element and it's lowest form eg.

<body>
 <div>
  <div>
   <h1>
     "Hello, world"
   </h1>
  </div>
 <p>
  "How are you today?"
 </p>
 </div>
<body>

How could I get the h1 and the p element but not the div? So my array would be [h1, p] (keep in mind I'm trying to do this on a much larger scale with hundreds of elements)

I_love_vegetables
  • 1,575
  • 5
  • 12
  • 26
  • Loop through all the elements. For each element, check if its `childElementCount` is `0`. Then it's a terminal node rather than a container. – Barmar Jul 27 '21 at 17:04

2 Answers2

0

What you want to do is find the (non-empty) text nodes, then return their parents. The recursive implementation of this is:

function parentTagsOfText (elem, parentList) {
  if (elem.nodeType == 4) { // text
    parentList.push(elem.parentNode);
    return;
  }
  if (elem.childNodes) {
    for (let i = 0; i < elem.childNodes.length; i++) {
      parentTagsOfText(elem.childNodes[i], parentList);
    }
  }
}
ControlAltDel
  • 33,923
  • 10
  • 53
  • 80
0

Based on this answer, with the following function, you get all the text elements under a passed element.

function textNodesUnder(node){
  var all = [];
  for (node=node.firstChild;node;node=node.nextSibling){
    if (node.nodeType==3) all.push(node);
    else all = all.concat(textNodesUnder(node));
  }
  return all;
}

Then filter out the empty ones and get their parents.

textNodesParents = textNodesUnder(document.body).filter(x =>
    x.nodeValue.trim() != '').map(x => x.parentNode);
Kerap
  • 30
  • 1
  • 4