I'm generating an XML file from data held in a MySQL database using PHP using the DomDocument to create the XML structure but struggling with the apostrophe in some of the text. The file I'm trying to replicate from a legacy system encodes the apostrophe to '. When I echo the $dom->savexml() to the screen the results look ok (the apostrophe appears as ') but when using $dom->save to save the text to file, the apostrophe appears as ' i.e. it appears to be double escaping the text and encoding the ampersand.
I've been scouring many threads on this over the last few days to see if there is anything I've missed and my last round of testing has been based on the previous article here: PHP How to use quot; entities in XML with DOMdocument which was started nearly 4.5 years ago.
I've also tried different methods including using htmlspecialchars and htmlentities using various combinations of the flags and setting double encode to false.
Using htmlspecial characters, I'm following the advice in the PHP manual that single quotes are only translated where both ENT_QUOTES is set and ENT_XML1, ENT_XHTML or ENT_HTML5. I've tried all three of those.
Moving onto code examples to help illustrate the problem...
This is mostly taken from Jack's accepted answer to the question in the thread linked above with the addition with the addition of the htmlspecialchars function wrapped around the content for the text node.
$dom1 = new DOMDocument;
$e = $dom1->createElement('description');
$content = 'single quote: \', double quote: ", opening tag: <, ampersand: &, closing tag: this has changed 02 >';
$t = $dom1->createTextNode(htmlspecialchars($content, ENT_XML1 | ENT_QUOTES,'utf-8',false));
$e->appendChild($t);
$dom1->appendChild($e);
echo '#results: '.$dom1->savexml();
$test1 = $dom1->savexml();
$dom1->save("./exports/"."testing_dom.xml");
Echoing the results to screen gives the output I'm looking for, i.e. in the addition to the ampersand, less than and greater than characters being encoded to & < and > respectively, the double quote and single quote are encoded as " and ' which is what I'm looking for.
#results: single quote: ', double quote: ", opening tag: <, ampersand: &, closing tag: this has changed 02 >
The last line of the code above saves the results to a testing_dom.xml file, the contents of which appear as follows:
<?xml version="1.0"?>
<description>single quote: &apos;, double quote: &quot;, opening tag: &lt;, ampersand: &amp;, closing tag: this has changed 02 &gt;</description>
Here all of the characters seem to have the preceding ampersand of the entity double escaped i.e. ' becomes &apos;
Is there something I'm missing here with saving the file?