0

I'm looking for a regex that extracts the content of a html tag. In this case, I need to extract and parse the content of a div element. The HTML code inside the div element can be anything and I need to extract all.

I'm using the next regexp, but doesn't work in all cases.

<div\s+id="body"[^>]*>(?<PARAM1>(?:(?:(?!<div[^>]*>|</div>).)+|<div[^>]*>[\s\S]*?</div>)*)</div>

It doesn't works because extracted group PARAM1, stops in a closing div tag element before the desired one, and I can't found why.

The HTML looks like this:

(...any HTML...)<div id="body">
<div class="container">

    <ul class="breadcrumb">...SOME <li><p>....
    </ul>   

    <h1>...</h1>


    <div class="row">
        <div class="span8">
            <dl class="dl-horizontal">
                <dt>...</dt>
                <dd>..</dd>
                <dt>..</dt>
                <dd>..</dd>             
            </dl>
                <hr/>   
            <dl class="dl-horizontal">
                <dt>..</dt>
                <dd>..</dd>             
            </dl>       

        </div>
        <div class="span4">
            <p class="text-center">
                <img ...>                       
            </p>
        </div> **(STOPS HERE)**
    </div>

       <div> .... ANY HTML </div>

</div> (...more HTML...)

Thanks in advance,

Hybos
  • 156
  • 12
  • 4
    I will be first one to do this, do NOT use regex for HTML parsing. Check this out: [stackoverflow.com](http://stackoverflow.com/questions/56107/what-is-the-best-way-to-parse-html-in-c) – Tafari Nov 28 '13 at 08:47
  • Let me add that it is impossible to do that with regex because of different grammar complexity classes. – Jonas Bötel Nov 28 '13 at 08:51
  • Don't use Regex for this, HTMLAgilityPack is about 29 times more suited. – Voidpaw Nov 28 '13 at 08:52
  • Read this http://stackoverflow.com/a/1732454/642532 It is impossible. Use an XML parser instead. – Jonas Bötel Nov 28 '13 at 08:54
  • I'm now sure that it's not the best option. I was already evaluating the HTMLAgilityPack as alternative. Thanks a lot – Hybos Nov 28 '13 at 09:31

0 Answers0