106
$html = file_get_contents("http://www.somesite.com/");

$dom = new DOMDocument();
$dom->loadHTML($html);

echo $dom;

throws

Warning: DOMDocument::loadHTML(): htmlParseEntityRef: expecting ';' in Entity,
Catchable fatal error: Object of class DOMDocument could not be converted to string in test.php on line 10
gweg
  • 2,820
  • 6
  • 23
  • 23

12 Answers12

176

To evaporate the warning, you can use libxml_use_internal_errors(true)

// create new DOMDocument
$document = new \DOMDocument('1.0', 'UTF-8');

// set error level
$internalErrors = libxml_use_internal_errors(true);

// load HTML
$document->loadHTML($html);

// Restore error level
libxml_use_internal_errors($internalErrors);
John Magnolia
  • 16,769
  • 36
  • 159
  • 270
Dewsworld
  • 13,367
  • 23
  • 68
  • 104
100

I would bet that if you looked at the source of http://www.somesite.com/ you would find special characters that haven't been converted to HTML. Maybe something like this:

<a href="/script.php?foo=bar&hello=world">link</a>

Should be

<a href="/script.php?foo=bar&amp;hello=world">link</a>
mattalxndr
  • 9,143
  • 8
  • 56
  • 87
  • 8
    Just to expand on this, if the & character is even in text and not an HTML attribute, it still needs to be escaped to &. The reason the parser is throwing the error is because after seeing an & it's expecting a ; to terminate the HTML entity. – Kyle Jul 26 '12 at 16:17
  • 25
    ...and to expand further, calling `htmlentities()` or similar on the string will fix the problem. – Ben Jun 26 '13 at 06:16
56
$dom->@loadHTML($html);

This is incorrect, use this instead:

@$dom->loadHTML($html);
hichris123
  • 10,145
  • 15
  • 56
  • 70
Maanas Royy
  • 1,522
  • 1
  • 17
  • 30
16

There are 2 errors: the second is because $dom is no string but an object and thus cannot be "echoed". The first error is a warning from loadHTML, caused by invalid syntax of the html document to load (probably an & (ampersand) used as parameter separator and not masked as entity with &).

You ignore and supress this error message (not the error, just the message!) by calling the function with the error control operator "@" (http://www.php.net/manual/en/language.operators.errorcontrol.php )

@$dom->loadHTML($html);
GGets
  • 416
  • 6
  • 19
user279583
  • 201
  • 3
  • 2
12

The reason for your fatal error is DOMDocument does not have a __toString() method and thus can not be echo'ed.

You're probably looking for

echo $dom->saveHTML();
Mike B
  • 31,886
  • 13
  • 87
  • 111
11

Regardless of the echo (which would need to be replaced with print_r or var_dump), if an exception is thrown the object should stay empty:

DOMNodeList Object
(
)

Solution

  1. Set recover to true, and strictErrorChecking to false

    $content = file_get_contents($url);
    
    $doc = new DOMDocument();
    $doc->recover = true;
    $doc->strictErrorChecking = false;
    $doc->loadHTML($content);
    
  2. Use php's entity-encoding on the markup's contents, which is a most common error source.

Swivel
  • 3,020
  • 26
  • 36
Lorenz Lo Sauer
  • 23,698
  • 16
  • 85
  • 87
10

replace the simple

$dom->loadHTML($html);

with the more robust ...

libxml_use_internal_errors(true);

if (!$DOM->loadHTML($page))
    {
        $errors="";
        foreach (libxml_get_errors() as $error)  {
            $errors.=$error->message."<br/>";
        }
        libxml_clear_errors();
        print "libxml errors:<br>$errors";
        return;
    }
David Chan
  • 7,347
  • 1
  • 28
  • 49
9
$html = file_get_contents("http://www.somesite.com/");

$dom = new DOMDocument();
$dom->loadHTML(htmlspecialchars($html));

echo $dom;

try this

nmwi22
  • 91
  • 1
  • 1
6

I know this is an old question, but if you ever want ot fix the malformed '&' signs in your HTML. You can use code similar to this:

$page = file_get_contents('http://www.example.com');
$page = preg_replace('/\s+/', ' ', trim($page));
fixAmps($page, 0);
$dom->loadHTML($page);


function fixAmps(&$html, $offset) {
    $positionAmp = strpos($html, '&', $offset);
    $positionSemiColumn = strpos($html, ';', $positionAmp+1);

    $string = substr($html, $positionAmp, $positionSemiColumn-$positionAmp+1);

    if ($positionAmp !== false) { // If an '&' can be found.
        if ($positionSemiColumn === false) { // If no ';' can be found.
            $html = substr_replace($html, '&amp;', $positionAmp, 1); // Replace straight away.
        } else if (preg_match('/&(#[0-9]+|[A-Z|a-z|0-9]+);/', $string) === 0) { // If a standard escape cannot be found.
            $html = substr_replace($html, '&amp;', $positionAmp, 1); // This mean we need to escape the '&' sign.
            fixAmps($html, $positionAmp+5); // Recursive call from the new position.
        } else {
            fixAmps($html, $positionAmp+1); // Recursive call from the new position.
        }
    }
}
Nicolas Bouvrette
  • 4,295
  • 1
  • 39
  • 53
3

Another possibile solution is

$sContent = htmlspecialchars($sHTML);
$oDom = new DOMDocument();
$oDom->loadHTML($sContent);
echo html_entity_decode($oDom->saveHTML());
lastYorsh
  • 573
  • 7
  • 17
  • This will not work. According to http://php.net/manual/en/function.htmlspecialchars.php, all html special characters are escaped too. Take for example this piece of HTML code `Hello World`. Running this into `htmlspecialchars` will produce `<span>Hello World&lt/span>` which isn't HTML anymore. DOMDocument::loadHTML will not treat it as HTML anymore but as a string. – Twisted Whisper Oct 23 '13 at 10:48
  • This works for me: ``$oDom = new DOMDocument(); $oDom->loadHTML($sHTML); echo html_entity_decode($oDom->saveHTML());`` – Bartłomiej Jakub Kwiatek Mar 31 '16 at 09:50
0

Another possibile solution is,maybe your file is ASCII type file,just change the type of your files.

FRANK
  • 1
-1

Even after this my code is working fine , so i just removed all warning messages with this statement at line 1 .

<?php error_reporting(E_ERROR); ?>
Stephen
  • 415
  • 7
  • 8