1

I'm attempting to get the DOM elements of external pages. Based on other posts I'm trying:

$html = htmlentities(file_get_contents('http://www.slate.com'));    
$dom = new domDocument;
$dom->loadHTML($html);
echo "<pre>";
var_dump($dom);
echo "</pre>";

(Html entites kills warnings, but otherwise has the same result as leaving it out).

Based on what I've read, this should return various DOM parts in parent/child nodes. But the result of the code above contains no DOM nodes, just a huge "textContent" element that contains the entire page HTML.

Thanks in advance for thoughts on what I'm doing wrong.

daprezjer
  • 201
  • 2
  • 10
  • If you want to sisable warnings use `libxml_use_internal_errors(true)` . You cann't load DomDocument after `htmlentities` – splash58 Jul 08 '16 at 06:53

2 Answers2

1

You are looking for

$dom->documentElement

this will return a

DOMNode

object.

Also: Get rid of the htmlentities because this will mess up the HTML code you fetch. e.g.: < will get &lt, which your loadHTML won't interpret as a <. Take a look at: Disable warnings when loading non-well-formed HTML by DomDocument (PHP)

Dummy-Dump:

function dump(DOMNode $node)
{
    echo $node->nodeName;
    if ($node->hasChildNodes())
    {
        echo '<div style="margin-left:20px; border-left:1px solid black; padding-left: 5px;">';
        foreach ($node->childNodes as $childNode)
        {
            dump($childNode);
        }
        echo '</div>';
    }
}

dump($dom->documentElement);

Which looks like:

Dummy-Dump

Community
  • 1
  • 1
SpazzMarticus
  • 1,218
  • 1
  • 20
  • 40
0

You should consider using phpQuery (https://github.com/electrolinux/phpquery).

Łukasz
  • 16
  • 1
  • 3