0

im using pcre RegExp engine , and i have string that looks like this :

<h3 class="description">Description</h3>   <div class="wrapper">  dddsome string blah blahddssssseeeee <div class="empty"> </div></div> </div>          </div>

and regexp that works fine and cpture the string "dddsome string blah blahddssssseeeee" that looks like this :

<\s*h3\s*class="*.+?"\s*>.*?</\s*h3>.+?<\s*div.+?class\s*="wrapper"\s*>(.+?)<\s*div\s*class="empty">

now some time i have the Almost the the same pattern of string that looks like this not the div class="aplus" tag , when this tag appear i want the regexp above to fail to match the all string .

<h3 class="description">Description</h3>   <div class="wrapper">  <div class="aplus">  dddsome string blah blahddssssseeeee <div class="empty"> </div></div> </div> 
user63898
  • 29,839
  • 85
  • 272
  • 514
  • If you are trying to parse HTML or XML with regular expressions, please read this: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – murgatroid99 May 29 '12 at 19:51
  • Parsing HTML with REs? When will they ever learn? – Donal Fellows May 29 '12 at 19:51
  • In what language/platform are you using this regex? – anubhava May 29 '12 at 20:53
  • i know i know , but its something that i have , and have to be parsed with regexp , so im stack with it , and still need to parse it right thanks, using c++ / windows / pcer regexp engine – user63898 May 30 '12 at 02:51

1 Answers1

0

try this

<div.*>(.*)<div.*>

but use beautiful-soup for easy better web scraping

Pavan Kumar T S
  • 1,539
  • 2
  • 17
  • 26