I'm looking for a regex that extracts the content of a html tag. In this case, I need to extract and parse the content of a div element. The HTML code inside the div element can be anything and I need to extract all.
I'm using the next regexp, but doesn't work in all cases.
<div\s+id="body"[^>]*>(?<PARAM1>(?:(?:(?!<div[^>]*>|</div>).)+|<div[^>]*>[\s\S]*?</div>)*)</div>
It doesn't works because extracted group PARAM1, stops in a closing div tag element before the desired one, and I can't found why.
The HTML looks like this:
(...any HTML...)<div id="body">
<div class="container">
<ul class="breadcrumb">...SOME <li><p>....
</ul>
<h1>...</h1>
<div class="row">
<div class="span8">
<dl class="dl-horizontal">
<dt>...</dt>
<dd>..</dd>
<dt>..</dt>
<dd>..</dd>
</dl>
<hr/>
<dl class="dl-horizontal">
<dt>..</dt>
<dd>..</dd>
</dl>
</div>
<div class="span4">
<p class="text-center">
<img ...>
</p>
</div> **(STOPS HERE)**
</div>
<div> .... ANY HTML </div>
</div> (...more HTML...)
Thanks in advance,