4

I have the following errors outputed when I work with $dom->loadHTML('<?xml version="1.0" encoding="UTF-8"?>' . $html);.

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Char 0xD860 out of allowed range in Entity, line: 1 in D:\xampp\xampp\htdocs\xampp\similarity\functions.php on line 438
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Char 0xDEE2 out of allowed range in Entity, line: 1 in D:\xampp\xampp\htdocs\xampp\similarity\functions.php on line 438
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Invalid char in CDATA 0x3 in Entity, line: 1 in D:\xampp\xampp\htdocs\xampp\similarity\functions.php on line 438

How do I target and remove those "invalid" charaters using php?

Andrej

Andrej
  • 736
  • 2
  • 14
  • 35
  • You may need to consider a custom function to filter/translate invalid char. [http://www.tek-tips.com/viewthread.cfm?qid=1615290](http://www.tek-tips.com/viewthread.cfm?qid=1615290) – J A May 09 '12 at 09:14
  • http://stackoverflow.com/questions/3466035/how-to-skip-invalid-characters-in-xml-file-using-php – ohaal May 09 '12 at 09:47

1 Answers1

1

not tested, but this should work:

$buffer = ob_get_clean();
$tidy = new tidy();
$myHTML = $tidy->repairString('<?xml version="1.0" encoding="UTF-8"?>' . $html);

$dom->loadHTML($myHTML);
mgraph
  • 15,238
  • 4
  • 41
  • 75