-1

My regex until now: <h1>(.*?)<\/h1>(.*?)(?:<h1>)?

My test String: <h1>Foo</h1><h2>Bar</h2><h1>Baz</h1><h3>Test</h3><h1>ghj</h1>zuio

Right now the part (.*?) is matching the shortest String possible, but what I actually want is that it matches everything until the next match (meaning for the first match: <h2>Bar</h2>, for the second <h3>Test</h3>and so on (underlined in the picture below)).

Can anyone help me?

Picture of match at: https://regex101.com/

Hkrie
  • 117
  • 5
  • fiddle: https://regex101.com/r/6sNe0b/1 – Hkrie May 26 '20 at 20:26
  • [Do not use regular expressions to parse/match HTML](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) unless you really really really have to (and are well aware of all implications). – knittl May 26 '20 at 20:33
  • the html is rendered through markdown and I need a dynamic ToC. So I don't really have a choice there. – Hkrie May 27 '20 at 08:07

2 Answers2

0
(?<=<h1>)(.+?)(?=<h1>)

This grabs everything before the next "h1" element, but you could extend it to extract the fields only.

This grabs each field in a list. Could mix the 2 to get groups beginning with h1:

(?<=<h\d>)(.+?)(?:<\/h\d>)(?=<h\d>)
Mezmer
  • 1
  • 1
  • That's not quite it, what I need is that the first group gets the content within the

    tags and the second group should get anything from the first h1 to the following h1 tag. That should repeat for the whole string if possible.
    – Hkrie May 27 '20 at 08:05
0

<h1>(.*?)<\/h1>((?!<h1>))*

fiddle: https://regex101.com/r/6sNe0b/5

Das funktioniert jetzt so wie es soll. Trotzdem danke für alle die mit drüber nachgedacht haben.

Hkrie
  • 117
  • 5