-3

How do I select with Regular Expression the text after the </h2> closing tag until the next <h2> opening tag

<h2>my title here</h2>
Lorem ipsum dolor sit amet <b>with more tags</b>
<h2>my title here</h2>
consectetur adipisicing elit quod tempora

In this case I want to select this text: Lorem ipsum dolor sit amet <b>with more tags</b>

user4571629
  • 440
  • 3
  • 10
  • 24
  • 3
    Possible duplicate: [*RegEx match open tags except XHTML self-contained tags*](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags). – RobG Mar 11 '16 at 00:48
  • That's not related to my question – user4571629 Mar 11 '16 at 00:51
  • It's exactly your question: you're trying to parse HTML with a regular expression. That can be done for a limited set of conditions, but will not work in general. – RobG Mar 11 '16 at 01:37

3 Answers3

1

Try this: /<\/h2>(.*?)</g

This finds a closing tag, then captures anything before a new opening tag.

in JS, you'd do this to get just the text:

substr = str.match(/<\/h2>(.*?)<h2/)[1];

Regex101

var str = '<h2>my title here</h2>Lorem ipsum <b>dolor</b> sit amet<h2>my title here</h2>consectetur adipisicing elit quod tempora';

var substr = str.match(/<\/h2>(.*?)<h2/)[1].replace(/<.*?>/g, '');

console.log(substr);
//returns: Lorem ipsum dolor sit amet
sideroxylon
  • 4,338
  • 1
  • 22
  • 40
0

Try

/<\/h2>((?:\s|.)*)<h2/

And you can see it in action on this regex tester.

You can see it in this example below too.

(function() {
  "use strict";

  var inString, regEx, res, outEl;

  outEl = document.getElementById("output");

  inString = "<h2>my title here</h2>\n" +
    "Lorem ipsum dolor sit amet <b>with more tags</b>\n" +
    "<h2> my title here </h2>\n" +
    "consectetur adipisicing elit quod tempora"

  regEx = /<\/h2>((?:\s|.)*)<h2/

  res = regEx.exec(inString);

  console.log(res);
  res.slice(1).forEach(function(match) {
    var newEl = document.createElement("pre");
    newEl.innerHTML = match.replace(/</g, "&lt;").replace(/>/g, "&gt;");
    outEl.appendChild(newEl);
  });
}());
<main>
  <div id="output"></div>
</main>

I added \n to your example to simulate new lines. No idea why you aren't just selecting the <h2> with a querySelector() and getting the text that way.

zero298
  • 25,467
  • 10
  • 75
  • 100
0

Match the tags and remove them, by using string replace() function. Also this proposed solution removes any single closure tags like <br/>,<hr/> etc

var htmlToParse = document.getElementsByClassName('input')[0].innerHTML;

var htmlToParse = htmlToParse.replace(/[\r\n]+/g,""); // clean up the multiLine HTML string into singleline

var selectedRangeString =  htmlToParse.match(/(<h2>.+<h2>)/g); //match the string between the h2 tags

var parsedString = selectedRangeString[0].replace(/((<\w+>(.*?)<\/\w+>)|<.*?>)/g, ""); //removes all the tags and string within it, Also single tags like <br/> <hr/> are also removed

document.getElementsByClassName('output')[0].innerHTML += parsedString;
<div class='input'>
    <i>Input</i>

  <h2>my title here</h2>
  Lorem ipsum dolor sit amet <br/> <b>with more tags</b>
<hr/>
  <h2>my title here</h2>
  consectetur adipisicing elit quod tempora
</div>

<hr/>
<div class='output'>
  <i>Output</i>
  <br/>
</div>

Couple of things to remember in the code.

htmlToParse.match(/(<h2>.+<h2>)/g); returns an array of string, ie all the strings that was matched from this regex.

selectedRangeString[0] I am just using the first match for demo purspose. If you want to play with all the strings then you can just for loop it with the same logic.

Rajshekar Reddy
  • 18,647
  • 3
  • 40
  • 59