0

I'm developing a website and I need to load an xml file- let's say test.xml

XML nodes are well-formated, but values inside of them aren't. Value of every node is CDATA nested string (but CDATA isn't always well-formated). Example:

<root>
 <data>
   <value1><![CDATA[Some value]]></value1>
   <value2><![CDATA[ ]]></value2>
   <value3>![CDATA[  ]]></value3>
 </data>
</root>

Original XML structure is more complex, but this is the example of CDATA usage. In node value3, CDATA isn't valid (missing '<' character before '![CDATA').

I've tried to load the file with following code

<?php
  $xml = simplexml_load_file("test.xml"); 
?>

but I was getting warnings.

Then I've tried to use LIBXML_NOCDATA, but it wasn't improved. The second code I've tried was:

<?php
  $xml = simplexml_load_file("test.xml", null, LIBXML_NOCDATA); 
  //$xml = simplexml_load_file("test.xml", 'SimpleXMLElement', LIBXML_NOCDATA); 
?>

but still with warnings (with both lines).

Is it possible to load file and then parse it (e.g $xml->data->value3) or not?

onlyme88
  • 13
  • 3
  • LIBXML_NOCDATA is not a magic bullet, and contrary to persistent myths, it is actually pretty useless with SimpleXML, because SimpleXML handles CDATA rather nicely by itself. I explained a bit about what it does here: http://stackoverflow.com/a/13981917/157957 Your problem is much more mundane: you have broken XML; the fact that the broken bits *should* be CDATA sections doesn't help, because they're broken, so they're not. – IMSoP May 05 '14 at 01:29

2 Answers2

0

This is not valid XML file

So you should repair it before usage The simplest way - is to use Tidy lib included in PHP

<?php
error_reporting(E_ALL);
$file = '1.xml';

$tidy = new tidy();
$repaired = $tidy->repairfile($file, array(
    'input-xml' => true,
    'escape-cdata' => false
));
var_dump(simplexml_load_string($repaired));
0

If you're getting bad XML the right approach is always to find out why, and eliminate the root cause. If it's a data feed over which you genuinely have no control, seriously consider not using it: if the quality is so poor, is the data really worth having?

Michael Kay
  • 156,231
  • 11
  • 92
  • 164