33

I run the code first on MAMP and it worked very well. But when I tried to run the code on another server, I got a lot of warnings like:

Warning: DOMDocument::loadHTML(): Unexpected end tag : head in Entity, line: 3349 in /cgihome/zhang1/html/cgi-bin/getPrice.php on line 17 Warning: DOMDocument::loadHTML(): htmlParseStartTag: misplaced tag in Entity, line: 3350 in /cgihome/zhang1/html/cgi-bin/getPrice.php on line 17 Warning: DOMDocument::loadHTML(): Tag header invalid in Entity, line: 3517 in /cgihome/zhang1/html/cgi-bin/getPrice.php on line 17

The codes are following:

<?php
 $amazon = file_get_contents('http://www.amazon.com/blablabla');
 $doc = new DOMdocument();
 $doc->loadHTML($amazon);
 $doc->saveHTML();
 $price = $doc -> getElementById('actualPriceValue')->textContent;
 $ASIN = $doc -> getElementById('ASIN')->getAttribute('value');
?>

Anyone knows what's going on? Thanks!

Syscall
  • 19,327
  • 10
  • 37
  • 52
LuZ
  • 335
  • 1
  • 3
  • 4

3 Answers3

136

To disable the warning, you can use

libxml_use_internal_errors(true);

This works for me, Manual, read on:


Background: You are loading invalid HTML. Invalid HTML is quite common, DOMDocument::loadHTML corrects most of the problems, but gives warnings by default.

With libxml_use_internal_errors you can control that behavior. Set it before loading the document:

$previously = libxml_use_internal_errors(true);
$doc->loadHTML($amazon);

Then after loading you can deal with the errors (if you want/need to):

/* @var LibXMLError[] $xmlErrors */
$xmlErrors = libxml_get_errors();

And finally clear them (as they will add up) and restore the previous setting if applicable:

unset($xmlErrors);
libxml_clear_errors();
libxml_use_internal_errors($previously);

References

hakre
  • 193,403
  • 52
  • 435
  • 836
  • 1
    Also do not forget to visit the manual page for a function that creates some errors in the future. You often find useful notes and usage infos there. Also there are user-comments. See [`DOMDocument::loadHTML`](http://de.php.net/DOMDocument.loadHTML) – hakre Aug 05 '12 at 20:09
  • @user1577801: If this answer solved your problem, consider upvoting and accepting it, by clicking on the large green tick mark under the answer's score. – Madara's Ghost Aug 05 '12 at 20:12
6

This problem is related to non xHTML code

As DOMdocument() can only process clean XHTML you need to clean up your code

Php have an extension that does the job pretty well. Called Tidy php.net/book.tidy

It might be tricky as you may need to enable it in your php.ini

Then

$tidy_config = array( 
                     'clean' => true, 
                     'output-xhtml' => true, 
                     'show-body-only' => true, 
                     'wrap' => 0, 

                     ); 

$tidy = tidy_parse_string( $html, $tidy_config, 'UTF8'); 
$tidy->cleanRepair(); 
$doc = new DOMdocument();
$doc->loadHTML( (string) $tidy);
Pascal
  • 2,377
  • 3
  • 25
  • 40
4

You can surpress the warning like this:

@$doc->loadHTML($amazon);
Aminah Nuraini
  • 18,120
  • 8
  • 90
  • 108