0
<html>
    <head><title>bla bla</title></head>
    <body>
    <div id="mainContent" xmlns:h="http://www.w3.org/1999/xhtml" xmlns="http://www.w3.org/1999/xhtml">
        bla bla .....
    </div>
    </body>
</html>

I need to extract that division. How can I do it using PHP 5?

The html source is not currectly formatted. There are some undefined attributes.

Cœur
  • 37,241
  • 25
  • 195
  • 267
cola
  • 12,198
  • 36
  • 105
  • 165

2 Answers2

1

If your HTML is not well formed, you can still use stuff like DOMDocument, e.g.:

$d = new DOMDocument;
$d->loadHTML($htmlstring);

$x = new DomXPath($d);

foreach ($x->query('//div[@id="mainContent"]') as $node) {
    echo $node->nodeValue;
}

Alternatively, just prefix the HTML with <!DOCTYPE html> so that you can use getElementById as per normal.

Ja͢ck
  • 170,779
  • 38
  • 263
  • 309
0

/<div id=\"mainContent\".*?</div>/gs

http://regexr.com?30o0l if you want to capture everything from the div opening tag to the closing tag.

Jack
  • 5,680
  • 10
  • 49
  • 74
  • This will match anything tile the **last** closing tag. It will work only for this very simple example. – stema Apr 23 '12 at 08:54