118

I noticed that when using SimpleXMLElement on a document that contains those CDATA tags, the content is always NULL. How do I fix this?

Also, sorry for spamming about XML here. I have been trying to get an XML based script to work for several hours now...

<content><![CDATA[Hello, world!]]></content>

I tried the first hit on Google if you search for "SimpleXMLElement cdata", but that didn't work.

Angelo
  • 1,247
  • 3
  • 9
  • 8
  • How are you trying to access the node value? And, is SimpleXML a requirement? – allnightgrocery Jun 03 '10 at 23:58
  • I tried every other function (xml2array and all) that I could find on the web and SimpleXML seems to be the only one that gives GOOD results, except for the CDATA not working. – Angelo Jun 04 '10 at 00:02
  • 1
    We do a lot of XML parsing at work using DOMDocument (http://www.php.net/manual/en/class.domdocument.php). It works just fine in handling CDATA. Give that a short or post a little more code for us to see how you're working with SimpleXML. – allnightgrocery Jun 04 '10 at 00:51

6 Answers6

205

You're probably not accessing it correctly. You can output it directly or cast it as a string. (in this example, the casting is superfluous, as echo automatically does it anyway)

$content = simplexml_load_string(
    '<content><![CDATA[Hello, world!]]></content>'
);
echo (string) $content;

// or with parent element:

$foo = simplexml_load_string(
    '<foo><content><![CDATA[Hello, world!]]></content></foo>'
);
echo (string) $foo->content;

You might have better luck with LIBXML_NOCDATA:

$content = simplexml_load_string(
    '<content><![CDATA[Hello, world!]]></content>'
    , null
    , LIBXML_NOCDATA
);
hakre
  • 193,403
  • 52
  • 435
  • 836
Josh Davis
  • 28,400
  • 5
  • 52
  • 67
  • 4
    No, PHP skips CDATA completely for some reason. Any other ideas? – Angelo Jun 04 '10 at 00:24
  • 4
    Then it's a bug. Upgrade PHP/libxml until it works (I've never had any problems with CDATA and SimpleXML.) You may want to try your luck with LIBXML_NOCDATA otherwise. – Josh Davis Jun 04 '10 at 01:56
  • Right on. Without the LIBXML_NOCDATA, the XML comes in as false - regardless of how it's done. I was able to prove that out with both creation methods... $x = new SimpleXMLElement('<![CDATA[Hello, world!]]>', LIBXML_NOCDATA); $y = simplexml_load_string('<![CDATA[Hello, world!]]>', "SimpleXMLElement", LIBXML_NOCDATA); print_r($y); Without that option, they're both null. Just wanted to back up your assertion. – allnightgrocery Jun 04 '10 at 02:09
  • LIBXML_NOCDATA should be the second parameter, not the third! Otherwise it works fine, +1. – Pavel S. Aug 20 '13 at 13:33
  • 8
    I know this is an old answer, but I would like to stress that **the first part of this answer is correct**. When you print the result with `print_r` you are indeed not accessing it correctly. Write the code you actually want - probably with `echo`, or with a `(string)` cast, and you will find the content is fine. **Do not use LIBXML_NOCDATA it is irrelevant.** – IMSoP May 05 '14 at 01:26
  • While debugging an application, `var_dump`'ing a SimpleXMLElement containing CDATA's doesn't show nodes content. But `var_dump`'ing this did the job: `simplexml_load_string($simplexml->asXML(), null, LIBXML_NOCDATA)` – Gras Double May 06 '14 at 00:17
  • 10
    @IMSoP Adding LIBXML_NOCDATA (and changing nothing else) works, so I'm not so sure it is irrelevant. – rand Feb 06 '15 at 10:55
  • 1
    @SimonePalazzo Adding LIBXML_NOCDATA fixes `print_r` and `var_dump` output, yes. It does not fix any code you should actually be using in production, because whenever you actually try to use that string, you'll find that the CDATA was there all along. – IMSoP Feb 06 '15 at 11:03
  • @IMSoP Well, then it's not working for me :) I'm using simplexml_load_string + convert object to array + edit array + convert back to xml, and without LIBXML_NOCDATA it does not work, i.e. the corresponding field is empty (don't know if null or empty string). – rand Feb 06 '15 at 11:20
  • @SimonePalazzo Your mistake is converting the SimpleXML object to an array - that's not what SimpleXML is designed for. You should be using `foreach`, `->element`, `['attribute']`, etc on the SimpleXML object itself. See: http://php.net/manual/en/simplexml.examples-basic.php Or alternatively, you should be using a different parser to produce an array more suited to your needs. Or using the DOM interface, which has better editing functions. – IMSoP Feb 06 '15 at 12:03
  • @IMSoP I see... but why does LIBXML_NOCDATA help then? – rand Feb 06 '15 at 16:54
  • 4
    @SimonePalazzo XML consists of various different "nodes" - e.g. `a text node <![CDATA a cdata node]]> another text node`. The CDATA and text nodes are different types, and SimpleXML tracks this so you can get back the XML you put in. When you squeeze a SimpleXML object into an array, it throws away a lot of information - CDATA nodes, comments, any element not in the current namespace (e.g. ``), the position of the child element in the text, etc. `LIBXML_NOCDATA` converts CDATA nodes into text nodes, but doesn't fix the rest. – IMSoP Feb 07 '15 at 15:54
  • For full reference: [Predefined Constants](https://secure.php.net/manual/en/libxml.constants.php) – Marcio Mazzucato Jun 08 '17 at 16:01
68

The LIBXML_NOCDATA is optional third parameter of simplexml_load_file() function. This returns the XML object with all the CDATA data converted into strings.

$xml = simplexml_load_file($this->filename, 'SimpleXMLElement', LIBXML_NOCDATA);
echo "<pre>";
print_r($xml);
echo "</pre>";


Fix CDATA in SimpleXML

Pradip Kharbuja
  • 3,442
  • 6
  • 29
  • 50
15

This is working perfect for me.

$content = simplexml_load_string(
    $raw_xml
    , null
    , LIBXML_NOCDATA
);
Tunaki
  • 132,869
  • 46
  • 340
  • 423
VijayRana
  • 953
  • 1
  • 13
  • 38
14

This did the trick for me:

echo trim($entry->title);
hakre
  • 193,403
  • 52
  • 435
  • 836
brz
  • 1,846
  • 21
  • 21
1

When to use LIBXML_NOCDATA ?

I add the issue when transforming XML to JSON.

$xml = simplexml_load_string("<foo><content><![CDATA[Hello, world!]]></content></foo>");
echo json_encode($xml, true); 
/* prints
   {
     "content": {}
   }
 */

When accessing the SimpleXMLElement object, It gets the CDATA :

$xml = simplexml_load_string("<foo><content><![CDATA[Hello, world!]]></content></foo>");
echo $xml->content; 
/* prints
   Hello, world!
*/

I makes sense to use LIBXML_NOCDATA because json_encode don't access the SimpleXMLElement to trigger the string casting feature, I'm guessing a __toString() equivalent.

$xml = simplexml_load_string("<foo><content><![CDATA[Hello, world!]]></content></foo>", null, LIBXML_NOCDATA);
echo json_encode($xml);
/*
 {
   "content": "Hello, world!"
 }
*/
Gabriel Glenn
  • 1,174
  • 1
  • 13
  • 30
0

While using SimpleXMLElement class directly

new SimpleXMLElement($rawXml, LIBXML_NOCDATA);
kkochanski
  • 2,178
  • 23
  • 28