PHP How to avoid this warning: DOMDocument::loadHTML(): Invalid char in CDATA

Question

I'm trying to collect some info from a web service, but I'm having issues with the CDATA Section of a page, because everything goes right when I use something like this:

$url = 'http://www.example.com';
$content = file_get_contents($url);
$doc = new DOMDocument();
$doc->loadHTML($content);   

foreach($doc->getElementsByTagName('h3') as $subtitle) {
    echo $subtitle->textContent; //The output is the Subtitle/s. 
}

But when the page contains CDATA sections there is a problem with this error on the line $doc->loadHTML($content).

Warning: DOMDocument::loadHTML(): Invalid char in CDATA

I've seen over here a solution that I tried to implement without any success.

function sanitize_html($content) {
  if (!$content) return '';
  $invalid_characters = '/[^\x9\xa\x20-\xD7FF\xE000-\xFFFD]/';
  return preg_replace($invalid_characters,'', $content);
}

$url = 'http://www.example.com';
$content = file_get_contents($url);
$cleanContent = sanitize_html($content);
$doc = new DOMDocument();
$doc->loadHTML($cleanContent); //Warning: DOMDocument::loadHTML(): htmlParseEntityRef: no name in Entity

But I got this other error:

Warning: DOMDocument::loadHTML(): htmlParseEntityRef: no name in Entity

What could be a good way to deal with the CDATA sections of a page? Greetings.

Maybe using Tidy http://stackoverflow.com/a/10513231/4471134 ? or simply ignore it `libxml_use_internal_errors(true);` — Alexey Chuhrov, Apr 27 '17 at 07:54

score 0 · Answer 1 · edited Aug 27 '19 at 13:54

0

Try adding PCLZIP before load IOFactory as shown:

require_once '/Classes/PHPExcel.php';
 \PHPExcel_Settings::setZipClass(\PHPExcel_Settings::PCLZIP);

edited Aug 27 '19 at 13:54

Itchydon

2,572
6
19
33

answered Aug 27 '19 at 12:49

Suhas

1

score 0 · Answer 2 · edited Aug 27 '19 at 13:38

0

The solution is to - replace the & symbol with & or if you must have that & as it is then, may be you could enclose it in: <![CDATA[ - ]]>

edited Aug 27 '19 at 13:38

Artem

3,304
3
18
41

answered Aug 27 '19 at 12:57

dılo sürücü

3,821
1
26
28

Muhammad Shahbaz · Answer 3 · 2022-07-26T15:53:15.627

0

add libxml_use_internal_errors(true) and libxml_clear_errors() this work for me please click below to review code

https://i.stack.imgur.com/6MN4H.png

edited Jul 26 '22 at 15:53

answered Jul 26 '22 at 15:51

Muhammad Shahbaz

1
2

1

Please dont post image of code and paste it here instead – Simas Joneliunas Jul 28 '22 at 03:56

PHP How to avoid this warning: DOMDocument::loadHTML(): Invalid char in CDATA

3 Answers3