2

How can I make SimpleXML to replace HTML/XML entities with their respective characters, in PHP?

Assume having this XML document, in a string:

$data = '<?xml version="1.0" encoding="ISO-8859-1"?><example>Tom &amp; Jerry</example>'

Obviously, I want SimpleXml to decode &amp; to &. It does not do it by default. I have tried these two ways, neither of which worked:

$xml = new SimpleXMLElement($data);
$xml = new SimpleXMLElement($data, LIBXML_NOENT);

What's the best way to get XML entities decoded? I guess XML parser should do it, I would like to avoid running html_entity_decode before parsing (actually, it won't work either). May this be a problem with the encoding of the string? If so, how could I track and fix it?

Pavel S.
  • 11,892
  • 18
  • 75
  • 113

1 Answers1

0

I don't know if this is going to work in some cases but maybe...

$xml = new SimpleXMLElement(html_entity_decode($data));

http://www.php.net/manual/en/function.html-entity-decode.php

hendr1x
  • 1,470
  • 1
  • 14
  • 23
  • I just reread your post...did you have the declaration that you wanted to avoid running html_entity_decode in there before I answered. If so, my apologies. Perhaps you could do something like..... new SimpleXMLElement(str_replace("&","&",$data))? – hendr1x Aug 07 '13 at 16:00
  • Also my assumption is, if SimpleXMLElement parses the data properly then this not an issue with the XML syntax...this is just how the data was entered into the XML doc. You need to fix it before it gets in there, or as it comes out... – hendr1x Aug 07 '13 at 16:05
  • Unfortunately this doesn't work if you end up decoding < or > , which ends up breaking the xml. – Jeremy L. Apr 03 '21 at 06:36