1

I have a php page that parses some xml text. That text comes from user input in a html textfield.

Whenever there is any whitespace at all between nodes, the Domdocument xml parser fails to parse the document correctly. Essentially it will recognize the first node, but any nested nodes it cannot find.

Removing the whitespace, it works no problem.

$xmldoc = new DOMDocument();
$xmldoc->loadXML($rawxml);

$top = $xmldoc->documentElement;
if(!$top) {echo "error: xml config is empty"; exit(-1);}
if($top->nodeName != "config") die("error: expect config tag as first element");


$nameNode = $top->childNodes->item(0);

//Fails here
if($nameNode->nodeName != "name") die("error: expect name tag following config tag");

Works

<config><name>sdf2</name></config>

Does not work

<config>   <name>sdf2</name></config>
user623879
  • 4,066
  • 9
  • 38
  • 53
  • I'm dumb..coulda used regex.....$rawxml = preg_replace("/>\s+", "><", $rawxml); – user623879 Sep 15 '11 at 09:20
  • @user623879 it's wrong way. Read http://stackoverflow.com/questions/3577641/best-methods-to-parse-html-with-php – OZ_ Sep 15 '11 at 09:25

1 Answers1

0

This is expected behavior. When you load a formatted XML document with DOM any whitespace, e.g. indenting, linebreaks and node values will be part of the DOM as DOMText instances by default. You can disable this by doing

$xmldoc->preserveWhiteSpace = false;

before loading the XML which will then discard any formatting whitespace. For a more detailed answer see

Community
  • 1
  • 1
Gordon
  • 312,688
  • 75
  • 539
  • 559