1

I'm getting an HTML I need to parse it so that I can read text under a certain Heading. More specifically, there is a div tag that includes several H2 elements and I need to read only the text between the 3rd and 4th H2 heading, i.e. the Summary section.

<div>
    <h2>Risks</h2>
    <p>Lorem ipsum dolor sit amet, consectetuer adipiscing elit.</p>
    <p>Donec fermentum orci nec felis.</p>
    <h2>Affected Systems</h2>
    <p>Sed sollicitudin diam id sapien.</p>
    <p>Ut libero.</p>
    <h2>Summary</h2>
    <!-- from here -->
    <p>Vestibulum quam libero, malesuada et, ornare id, aliquet id, tellus.</p>
    <p>Nullam dapibus viverra quam.</p>
    <p>Vestibulum sit amet nunc vel justo dictum pharetra.</p>
    <!-- through here -->
    <h2>Avoidance</h2>
    <p>Proin eleifend mi eget massa.</p>
    <p>Pellentesque feugiat sapien a ante.</p>
</div>
outis
  • 75,655
  • 22
  • 151
  • 221
  • Does this answer your question? [Parse an HTML string with JS](https://stackoverflow.com/questions/10585029/parse-an-html-string-with-js) and [how can I select all elements between two elements](https://stackoverflow.com/questions/12794874/how-can-i-select-all-elements-between-two-elements) – evolutionxbox Dec 13 '21 at 12:48
  • Do not post links to screen shots here, post actual code here that you have a challenge with; what your specific challenge is so we may best assist you with your issue – Mark Schultheiss Dec 13 '21 at 13:29

2 Answers2

0

Good question. You can use a recursive function well for that. The function gets the start point (third h2) and the end point (fourth h2). Then you iterate over every single element within these two points. I have now written the output to the console. But you can concatenate it into a string.

function getTextFromTo(rootNode, startNode, endNode) {
    let pastStartNode = false, reachedEndNode = false, textNodes = [];

    function getTextNodes(node) {
        if (node == startNode) {
            pastStartNode = true;
        } else if (node == endNode) {
            reachedEndNode = true;
        } else if (node.nodeType == 3) {
            if (pastStartNode && !reachedEndNode && !/^\s*$/.test(node.nodeValue)) {
                textNodes.push(node);
            }
        } else {
            for (var i = 0, len = node.childNodes.length; !reachedEndNode && i < len; ++i) {
                getTextNodes(node.childNodes[i]);
            }
        }
    }

    getTextNodes(rootNode);
    return textNodes;
}


const from = document.querySelector('div :nth-child(5)'); // from
const to = document.querySelector('div :nth-child(11)'); // to
const root = document.querySelector('div'); 

var textNodes = getTextFromTo(root, from, to);


for (let i = 0, len = textNodes.length, div; i < len; ++i) {
    console.log(textNodes[i].data)
}
<div class="col-md-12">
  <h2>title 1</h2>
  <ul><li></li></ul>
    
  <h2>title 2</h2>
  <ul><li></li></ul>
  <h2>Resume</h2>
  <p>text 1</p>
  <p>text 2</p>
  <p>text 3 this one</p>
  <p>text 4</p>
  <p>text 5 this one</p>
  <h2>next title</h2>
</div>

The originator of this cool function is @TimDown. I just adapted it. How can I find all text nodes between two element nodes with JavaScript/jQuery?

Maik Lowrey
  • 15,957
  • 6
  • 40
  • 79
-1

You can use regex for it

/(?<=<h2>Résumé<\/h2>)(.|\n)*?(?=<h2>)/g

This will get all the text after <h2>Résumé<\/h2>' till next <h2> tag.