1

Hello,

I have an endpoint that returns XML data but with HTMl as content,

So far using simplexml_load_string($result, "SimpleXMLElement", LIBXML_NOCDATA); returns a striped verion i.e all tags were removed

Sample Data

<JUDGMENT>
    <summary>INTRODUCTION: <br />This appeal borders on the Law of Contract.<br /><br />FACTS: <br />From the statement of claim filed by the plaintiff (respondent) breach of contract and negligence were said to have been committed against it, by the appellant. Amongst other prayers, the respondent claimed:-<br />(a) one hundred million naira as general damages for malicious breach of contract and negligent conduct;<br />(b) sixty-four million, two hundred and fifty thousand, nine hundred and twenty naira as special damages;<br /></summary>
</JUDGMENT>

Parsing through my method

protected function scaffoldXML($result)
    {
        $xml = simplexml_load_string($result, "SimpleXMLElement", LIBXML_NOCDATA);
        return $json = json_encode($xml);
    }

Returns

"summary":"INTRODUCTION: This appeal borders on the Law of Contract.FACTS: From the statement of claim filed by the plaintiff (respondent) breach of contract and negligence were said to have been committed against it, by the appellant. Amongst other prayers, the respondent claimed:-(a) one hundred million naira as general damages for malicious breach of contract and negligent conduct;(b) sixty-four million, two hundred and fifty thousand, nine hundred and twenty naira as special damages;(c) one million, five hundred thousand naira only being the cost of this action; and(d) 10% interest per annum on the judgment sum, from the date of judgment until liquidation.The appellant denied the claim and on exchange of pleadings...

Question Again

I want to retain the HTML tags in the data while removing the XML tags

Please Note It is a full XML whose structure cannot be predicted and i want to extract all the values in the XML not just part

funsholaniyi
  • 435
  • 2
  • 12
  • 1
    The reason being that stringifying a SimpleXML object returns its text content. I'm not sure why you want to JSON encode an XML object when you already have the string (for which JSON encoding doesn't even make sense). – Dormilich May 08 '18 at 08:45
  • The encode is to ultimately send as JSON, mind you, this is only a snippet of the XML data, The issue is what the function returns. – funsholaniyi May 08 '18 at 08:51
  • you could get the data that you want first, then encode it, its just few lines so i don't think it will hurt in any way – Kevin May 08 '18 at 08:59
  • Since there is no CDATA in your XML you can directly pass the XML string to json_encode(). No need to use SimpleXML at all. – Dormilich May 08 '18 at 09:03
  • You’re making it really rather hard to understand what your actual issue here is then ... Is it that you want to extract only part of your input XML file? Then use `SimpleXMLElement::saveXML` to get an XML representation of a specific node again ... – CBroe May 08 '18 at 09:05
  • I just want to extract HTML data from XML Content, not part of the XML, The function simplexml_load_file seems to strip all tags – funsholaniyi May 08 '18 at 09:13
  • SimpleXML is not stripping any tags. You are losing content because you are trying to convert directly from XML to JSON, which will always be a lossy translation, because the two formats use different structures. Keep your function to one responsibility: extracting some HTML, as a string, from the XML; once you have a string, use a different function for the different responsibility of adding that to a new structure to be serialized as JSON, if that's actually what you need. – IMSoP May 08 '18 at 11:36
  • 1
    If you agree that your problem is "extracting a subset of the XML including tags", then this is a duplicate: https://stackoverflow.com/questions/1937056/php-simplexml-get-innerxml (Note that as far as any parser is concerned *there is no HTML here*, the `
    ` tags are just part of the XML).
    – IMSoP May 08 '18 at 11:39

0 Answers0