7

I've looked through the other Stackoverflow questions on this topic and none of the solutions provided there seem to work for me.

I have an html page (scraped with file_get_contents()) and in that html is a div with an id of "main" - I need to get the contents of that div with PHP's DOMDocument, or something similiar. For this situation I can't use the SimpleHTMLDom parser, which complicates things a bit.

hakre
  • 193,403
  • 52
  • 435
  • 836
Charles Zink
  • 3,482
  • 5
  • 22
  • 24
  • When you say *I need to get the contents of that div* do you mean a the HTML? – alex Jun 20 '11 at 01:03
  • [DOMElement getElementById ( string $elementId )](http://php.net/manual/en/class.domdocument.php) – Ibu Jun 20 '11 at 01:04

2 Answers2

7

DOMDocument + XPath variation:

$xml = new DOMDocument();
$xml->loadHtml($temp);
$xpath = new DOMXPath($xml);

$html = '';
foreach ($xpath->query('//div[@id="main"]/*') as $node)
{
    $html .= $xml->saveXML($node);
}

If you're looking for innerHTML() (PHP DOMDocument Reference Question) - instead of innerXML() as in this answer - the xpath related variant is given in this answer.

Here the adoption with the changes underlined:

$html = '';
foreach ($xpath->query('//div[@id="main"]/node()') as $node)
                                          ######
{
    $html .= $xml->saveHTML($node);
                       ####
}
Community
  • 1
  • 1
hakre
  • 193,403
  • 52
  • 435
  • 836
3

Using DOMDocument...

$dom = new DOMDocument;

$dom->loadHTML($html);

$main = $dom->getElementById('main');

To get the serialised HTML...

html = '';
foreach($main->childNodes as $node) {
    $html .= $dom->saveXML($node, LIBXML_NOEMPTYTAG);
}

Use saveHTML() if your PHP version supports it.

alex
  • 479,566
  • 201
  • 878
  • 984