0

When adding a string that might contain troublesome characters (eg &, <, >), DOMDocument throws a warning, rather than sanitizing the string.

I'm looking for a succinct way to make strings xml-safe - ideally something that leverages the DOMDocument library.

I'm looking for something better than preg_replace or htmlspecialchars. I see DOMDocument::createTextNode(), but the resulting DOMText object is cumbersome and can't be handed to DOMDocument::createElement().

To illustrate the problem, this code:

<?php 

$dom = new DOMDocument;
$dom->formatOutput = true;
$parent = $dom->createElement('rootNode');
$parent->appendChild( $dom->createElement('name', 'this ampersand causes pain & sorrow ') );
$dom->appendChild( $parent );
echo $dom->saveXml();

produces this result (see eval.in):

Warning: DOMDocument::createElement(): unterminated entity reference          sorrow in /tmp/execpad-41ee778d3376/source-41ee778d3376 on line 6
<?xml version="1.0"?>
<rootNode>
  <name>this ampersand causes pain </name>
</rootNode>
doub1ejack
  • 10,627
  • 20
  • 66
  • 125
  • "Better than preg_replace or htmlspecialchars" - better in what respect? –  Feb 05 '15 at 18:08
  • preg_replace and htmlspecialchars are broad-spectrum tools. A preg_replace approach depends fully on the developer's knowledge of xml character issues. The htmlspecialcharacters approach [seems to be disputed](http://stackoverflow.com/questions/2822774/php-is-htmlentities-sufficient-for-creating-xml-safe-values). And since this issue is endemic to XML, I would expect an XML library to provide clean ways to deal with it. – doub1ejack Feb 05 '15 at 19:29

2 Answers2

2

You will have to create the text node and append it. I described the problem in this answer: https://stackoverflow.com/a/22957785/2265374

However you can extend DOMDocument and overload createElement*().

class MyDOMDocument extends DOMDocument {

  public function createElement($name, $content = '') {
    $node = parent::createElement($name);
    if ((string)$content !== '') {
      $node->appendChild($this->createTextNode($content));
    }
    return $node;
  }

  public function createElementNS($namespace, $name, $content = '') {
    $node = parent::createElementNS($namespace, $name);
    if ((string)$content !== '') {
      $node->appendChild($this->createTextNode($content));
    }
    return $node;
  }
}

$dom = new MyDOMDocument();
$root = $dom->appendChild($dom->createElement('foo'));
$root->appendChild($dom->createElement('bar', 'Company & Son'));
$root->appendChild($dom->createElementNS('urn:bar', 'bar', 'Company & Son'));

$dom->formatOutput = TRUE;
echo $dom->saveXml();

Output:

<?xml version="1.0"?>
<foo>
  <bar>Company &amp; Son</bar>
  <bar xmlns="urn:bar">Company &amp; Son</bar>
</foo>
Community
  • 1
  • 1
ThW
  • 19,120
  • 3
  • 22
  • 44
  • And unfortunately to be fair, the [documentation](http://php.net/manual/en/domdocument.createelement.php) does say this as well: _The value will not be escaped. Use DOMDocument::createTextNode() to create a text node with escaping support._ – Scuzzy Feb 05 '15 at 21:37
0

This is the structure I use to build XML elements, the second part is usually wrapped in a function.

$parent = $document->documentElement; // pick the node we want to append to
$name = 'foo'; // new element name
$content = 'bar < not a tag > <![CDATA[" testing cdata "]]>'; // content

$element = ($parent->ownerDocument) ? $parent->ownerDocument->createElement($name) : $parent->createElement($name);
$parent->appendchild($element);
$element->appendchild($parent->ownerDocument->createTextNode($content));

my function will then return $element

Scuzzy
  • 12,186
  • 1
  • 46
  • 46