10

If I use the following php code to convert an xml to json:

<?php

header("Content-Type:text/json");

$resultXML = "
<QUERY>
   <Company>fcsf</Company>
   <Details>
      fgrtgrthtyfgvb
   </Details>
</QUERY>
";

$sxml = simplexml_load_string($resultXML);
echo  json_encode($sxml);
?>

I get

{"Company":"fcsf","Details":"\n      fgrtgrthtyfgvb\n   "}

However, If I use CDATA in the Details element as follows:

<?php

header("Content-Type:text/json");

$resultXML = "
<QUERY>
   <Company>fcsf</Company>
   <Details><![CDATA[
      fgrtgrthtyfgvb]]>
   </Details>
</QUERY>
";

$sxml = simplexml_load_string($resultXML);
echo  json_encode($sxml);

?>

I get the following

{"Company":"fcsf","Details":{}}

In this case the Details element is blank. Any idea why Details is blank and how to correct this?

Ketan
  • 487
  • 10
  • 23
  • Did you try to remove ![CDATA[ and ]] before? $resultXML =str_replace('<![CDATA[', '', $resultXML); $resultXML =str_replace(']]>', '', $resultXML); – Tom Feb 04 '14 at 10:22

1 Answers1

32

This is not a problem with the JSON encoding – var_dump($sxml->Details) shows you that SimpleXML already messed it up before, as you will only get

object(SimpleXMLElement)#2 (0) {
}

– an “empty” SimpleXMLElement, the CDATA content is already missing there.

And after we figured that out, googling for “simplexml cdata” leads us straight to the first user comment on the manual page on SimpleXML Functions, that has the solution:

If you are having trouble accessing CDATA in your simplexml document, you don't need to str_replace/preg_replace the CDATA out before loading it with simplexml.

You can do this instead, and all your CDATA contents will be merged into the element contents as strings.

$xml = simplexml_load_file($xmlfile, 'SimpleXMLElement', LIBXML_NOCDATA);

So, use

$sxml = simplexml_load_string($resultXML, 'SimpleXMLElement', LIBXML_NOCDATA);

in your code, and you’ll get

{"Company":"fcsf","Details":"\n      fgrtgrthtyfgvb\n   "}

after JSON-encoding it.

Community
  • 1
  • 1
CBroe
  • 91,630
  • 14
  • 92
  • 150
  • The first sentence of this answer is wrong: SimpleXML has _parsed_ the CData node just fine, but neither `var_dump` nor `json_encode` _output_ it. If you access it directly, asking for the string content with `(string)`, you will see it is there just fine: https://3v4l.org/c2FoQ Blindly converting XML to JSON is simply not one of the design goals of SimpleXML, and this is just one of several problems you'll encounter trying to use it for that. – IMSoP Feb 07 '22 at 11:53
  • But this removes all html tags – aprinciple Jul 20 '22 at 10:07