0

I need a regular expression to find out whether or not a h1 tag is followed by a h2 tag, without any paragraph elements in between. I tried to use a negative lookahead but it doesn't work:

<h1(.+?)</h1>(\s|(?!<p))*<h2(.+?)</h2>
Peter Boughton
  • 110,170
  • 32
  • 120
  • 176
voodoo555
  • 320
  • 1
  • 6
  • 1
    You might find it easier to run it through an HTML parser and walk the DOM. – Matt S May 26 '10 at 15:28
  • 1
    [Why using regex to parse HTML is evil](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) Match better would be to use a proper HTML parser that lets you use XPath. – Felix Kling May 26 '10 at 15:30
  • `

    `, `

    `
    – kennytm May 26 '10 at 15:37

1 Answers1

1
<h1((?!</h1).)*</h1>((?!<p).)*<h2

should work.

It matches exactly one h1 tag, then any amount of characters up to the next h2 tag, but only if no p tag is found along the way.

Since in this scenario it is rather unlikely that nested tags would occur, this should be quite reliable, even with regex.

You'll need to activate your tool's/language's option for the dot to match newline characters. It might be enough to prefix your regex with (?s) to achieve this.

Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561