1

I have a script to parse an XML file of products, but I can't seem to get the hang of parsing it. I have the code:

$file = $shop_path.'datafeeds/MC-B01.xml';

$xml = simplexml_load_file($file, null, LIBXML_NOCDATA);
$items = $xml->Items;

for($i = 0; $i < 17000; $i++) {
    $name = $items[$i]->Product_Name;
    echo $name.'<br />';
}

However i get all kinds of strange errors thrown:

PHP Warning: in file C:/xampp/htdocs/trow/tools/rip.php on line 188: simplexml_load_file() [function.simplexml-load-file]: ./../datafeeds/MC-B01.xml:172439: parser error : CData section not finished
PHP Warning: in file C:/xampp/htdocs/trow/tools/rip.php on line 188: simplexml_load_file() [function.simplexml-load-file]: ons&#44; in or out of the water. Cleanup is a snap after the fun with Pipedream
PHP Warning: in file C:/xampp/htdocs/trow/tools/rip.php on line 188: simplexml_load_file() [function.simplexml-load-file]: ^
PHP Warning: in file C:/xampp/htdocs/trow/tools/rip.php on line 188: simplexml_load_file() [function.simplexml-load-file]: ./../datafeeds/MC-B01.xml:172439: parser error : PCDATA invalid Char value 3
PHP Warning: in file C:/xampp/htdocs/trow/tools/rip.php on line 188: simplexml_load_file() [function.simplexml-load-file]: ons&#44; in or out of the water. Cleanup is a snap after the fun with Pipedream 
PHP Warning: in file C:/xampp/htdocs/trow/tools/rip.php on line 188: simplexml_load_file() [function.simplexml-load-file]: ^
PHP Warning: in file C:/xampp/htdocs/trow/tools/rip.php on line 188: simplexml_load_file() [function.simplexml-load-file]: ./../datafeeds/MC-B01.xml:172439: parser error : Sequence ']]>' not allowed in content

The strange part is that the CData block that contains the text the errors show seems to be a correctly formed block. (I can't post it here because of it's adult nature.)

Any suggestions?

chaoskreator
  • 889
  • 1
  • 17
  • 39
  • What happens when you try without the "LIBXML_NOCDATA" param? Also, can you post some "sample" XML (you can strip out some content if needed? – webnoob May 17 '12 at 05:58
  • 1
    This doesn't answer the question, but you don't need to use LIBXML_NOCDATA. This is a persistent myth about SimpleXML. `$name = (string)$items[$i]->Product_Name` should work fine. – IMSoP Oct 17 '12 at 17:30

2 Answers2

2

CDATA doesn't mean anything you can put in, it means things might be confused with markup notation can be ignored by parser, instead. So your characters within CDATA must contain some control characters other than TAB, CR, LF or special FFFE, FFFF characters. Remove them and you'll have a bright day!

Scott Chu
  • 972
  • 14
  • 26
  • Yes, my problem was due to non-printable characters in the CDATA section. [The solution here fixed it](http://stackoverflow.com/questions/8781911/remove-non-ascii-characters-from-string-in-php) – Peter Gluck Aug 17 '14 at 06:27
0

You should try saving that doc as xml file locally on your workstation and open that in Internet Explorer or Firefox browser (or something which can parse and validate XML document) and whatever error you get that you need to fix.

To me it looks like it is some non-standard character which is keeping your CDATA section incomplete by getting somewhere in between.

deej
  • 2,536
  • 4
  • 29
  • 51