Javascript reg exp between closing tag to opening tag

Question

How do I select with Regular Expression the text after the </h2> closing tag until the next <h2> opening tag

<h2>my title here</h2>
Lorem ipsum dolor sit amet <b>with more tags</b>
<h2>my title here</h2>
consectetur adipisicing elit quod tempora

In this case I want to select this text: Lorem ipsum dolor sit amet <b>with more tags</b>

Possible duplicate: [*RegEx match open tags except XHTML self-contained tags*](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags). — RobG, Mar 11 '16 at 00:48
It's exactly your question: you're trying to parse HTML with a regular expression. That can be done for a limited set of conditions, but will not work in general. — RobG, Mar 11 '16 at 01:37

sideroxylon · Answer 1 · 2016-03-11T02:11:54.123

1

Try this: /<\/h2>(.*?)</g

This finds a closing tag, then captures anything before a new opening tag.

in JS, you'd do this to get just the text:

substr = str.match(/<\/h2>(.*?)<h2/)[1];

Regex101

var str = '<h2>my title here</h2>Lorem ipsum <b>dolor</b> sit amet<h2>my title here</h2>consectetur adipisicing elit quod tempora';

var substr = str.match(/<\/h2>(.*?)<h2/)[1].replace(/<.*?>/g, '');

console.log(substr);
//returns: Lorem ipsum dolor sit amet

edited Mar 11 '16 at 02:11

answered Mar 11 '16 at 00:54

sideroxylon

4,338
1
22
40

It's not working, I'm getting: Uncaught TypeError: Cannot read property '1' of null(…) – user4571629 Mar 11 '16 at 01:06
You will probably find that's because your string has breaks. Can you sanitise the string first? – sideroxylon Mar 11 '16 at 01:13
I don't know what does it mean, I just need to select all the content until the closing – user4571629 Mar 11 '16 at 01:15
Updated with sample. – sideroxylon Mar 11 '16 at 01:26
That's not working right, because If I have inside of the content other tag, for example: Lorem ipsum strong tag it will end right after the tag – user4571629 Mar 11 '16 at 01:36
That wasn't clear in the original question - fixed in the sample above. – sideroxylon Mar 11 '16 at 02:09

zero298 · Answer 2 · 2016-03-11T02:11:17.657

0

Try

/<\/h2>((?:\s|.)*)<h2/

And you can see it in action on this regex tester.

You can see it in this example below too.

(function() {
  "use strict";

  var inString, regEx, res, outEl;

  outEl = document.getElementById("output");

  inString = "<h2>my title here</h2>\n" +
    "Lorem ipsum dolor sit amet <b>with more tags</b>\n" +
    "<h2> my title here </h2>\n" +
    "consectetur adipisicing elit quod tempora"

  regEx = /<\/h2>((?:\s|.)*)<h2/

  res = regEx.exec(inString);

  console.log(res);
  res.slice(1).forEach(function(match) {
    var newEl = document.createElement("pre");
    newEl.innerHTML = match.replace(/</g, "&lt;").replace(/>/g, "&gt;");
    outEl.appendChild(newEl);
  });
}());

<main>
  <div id="output"></div>
</main>

I added \n to your example to simulate new lines. No idea why you aren't just selecting the <h2> with a querySelector() and getting the text that way.

edited Mar 11 '16 at 02:11

answered Mar 11 '16 at 00:51

zero298

25,467
10
75
100

But I want it without selecting the
tags
– user4571629 Mar 11 '16 at 00:53
@user4571629 Try now, use a capture group – zero298 Mar 11 '16 at 00:54
That's not selecting anything, am I missing something? can you give me an example with console.log like on jsfiddle? – user4571629 Mar 11 '16 at 01:07

Rajshekar Reddy · Answer 3 · 2016-03-11T07:21:49.997

Match the tags and remove them, by using string replace() function. Also this proposed solution removes any single closure tags like <br/>,<hr/> etc

var htmlToParse = document.getElementsByClassName('input')[0].innerHTML;

var htmlToParse = htmlToParse.replace(/[\r\n]+/g,""); // clean up the multiLine HTML string into singleline

var selectedRangeString =  htmlToParse.match(/(<h2>.+<h2>)/g); //match the string between the h2 tags

var parsedString = selectedRangeString[0].replace(/((<\w+>(.*?)<\/\w+>)|<.*?>)/g, ""); //removes all the tags and string within it, Also single tags like <br/> <hr/> are also removed

document.getElementsByClassName('output')[0].innerHTML += parsedString;

<div class='input'>
    <i>Input</i>

  <h2>my title here</h2>
  Lorem ipsum dolor sit amet <br/> <b>with more tags</b>
<hr/>
  <h2>my title here</h2>
  consectetur adipisicing elit quod tempora
</div>

<hr/>
<div class='output'>
  <i>Output</i>
  <br/>
</div>

Couple of things to remember in the code.

htmlToParse.match(/(<h2>.+<h2>)/g); returns an array of string, ie all the strings that was matched from this regex.

selectedRangeString[0] I am just using the first match for demo purspose. If you want to play with all the strings then you can just for loop it with the same logic.

That isn't what the OP asked for. Also, what happens if there's an element with no content, e.g. hr or input, in there? — RobG, Mar 11 '16 at 02:19
ohhh I see the complexity now.. I will tweek my logic. Thanks @RobG — Rajshekar Reddy, Mar 11 '16 at 02:21
@RobG I fixed my code. Now this is what the OP asks for right? — Rajshekar Reddy, Mar 11 '16 at 07:16

Javascript reg exp between closing tag to opening tag

3 Answers3

tags