1

I'm trying to create a regex that takes all the content from <div class="entrytext"> to the first </p> next to this div class.

At the moment this is what I have:

(?<=<div class="entrytext">.*<p>).*(?></p>)

Is going well cause all the code above this div is not matching, but the issue that I'm having is after this <div> there are a lots of </p> in the document.

What I would like is to take all the content next this div but until the first </p> found.

Could you give me a hand? Thanks in advance.

BoltClock
  • 700,868
  • 160
  • 1,392
  • 1,356
Jose3d
  • 9,149
  • 6
  • 33
  • 54
  • What programming language? Also, http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – BoltClock Apr 10 '11 at 17:53

1 Answers1

3
  1. Most regex parsers don't allow for variable length lookbehinds
  2. You would need non-greedy operators (A ? after your *)
    (?<=<div class="entrytext">.*?<p>).*?(?></p>)
  3. Regex is (surprisingly for once) the tool for this job, but still look into html parsers, whatever you are doing that needs this probably would benefit from one.
J V
  • 11,402
  • 10
  • 52
  • 72
  • @Jose3d: Make sure you understand _why_ it works. Look up 'greedy' and 'non-greedy' in the documentation or peruse http://www.regular-expressions.info – sehe Apr 10 '11 at 19:54
  • @sehe, why don't you tell him `?` is a quantifier as well as a quantifier modifier. –  Apr 11 '11 at 04:22
  • I think i got it, "?" takes the less possible while other quantifiers takes as much as possible. Thanks – Jose3d Apr 12 '11 at 16:00