1

I searched stack overflow for this, and found an old similar question here:

Ignore comment node in SimpleXML [duplicate]

Unfortunately neither this nor its duplicates answer the question in my opinion.


Using this code:

        $testXml = <<<XML
        <root>
          <comment>this comment is part of my payload and should be parsed</comment>
          <node>
          </node>
          <!-- this comment should not be parsed-->
        </root>
        XML;

        xmlDataTest = simplexml_load_string($testXml);
        var_dump($xmlDataTest);

I get:

object(SimpleXMLElement)#401 (2) {
  ["comment"]=>
  array(2) {
    [0]=>
    string(29) "this comment is part of my payload and should be parsed"
    [1]=>
    object(SimpleXMLElement)#403 (0) {
    }
  }
  ["node"]=>
  object(SimpleXMLElement)#402 (0) {
  }
}

But I would expect the commented-out-content to be completely ignored:

object(SimpleXMLElement)#401 (2) {
  ["comment"]=>
  string(55) "this comment is part of my payload and should be parsed"
  ["node"]=>
  object(SimpleXMLElement)#402 (0) {
  }
}

Does anybody have an idea how to make simplexml_load_string ignore the second comment?

EDIT due to comments concerning var_dump relevance.

If instead I want to quickly convert from XML to JSON I can do:

$json = json_encode(simplexml_load_string($testXml), JSON_PRETTY_PRINT);

Also here I get a different JSON depending on whether somebody put a comment in my XML or not. I either get nice and clean:

{
    "comment": "this comment is part of my payload and should be parsed",
    "node": {}
}

or ugly:

{
    "comment": [
        "this comment is part of my payload and should be parsed",
        {}
    ],
    "node": {}
}

Again I still feel it is very bad when comments change behaviour of simplexml_load_string, though I know some of you will disagree. Anyway I can handle it, and I thank you all for your good comments so far (I'll dispense some upvotes)

j3App
  • 1,510
  • 1
  • 17
  • 26
  • 2
    Not sure where you think the problem is with the `var_dump()`, if you `var_dump($xmlDataTest->comment);`, you only get the 1 node. The comment is still part of the XML but it shouldn't be mixed in with things like `` nodes. – Nigel Ren Jan 04 '21 at 11:08
  • Seems to work as I'd expect, https://3v4l.org/9ICsL. – user3783243 Jan 04 '21 at 11:13
  • @NigelRen. If I do "var_dump($xmlDataTest->comment);" I get an array with a String in it. I would however expect to get a String directly. As it is now, the XMP parsing behaves differently depending on whether the XML contains true comments or not. And that is not good practice in my opinion. If I look across most programming languages which I have been working with, comments are not supposed to affect behaviour – j3App Jan 04 '21 at 11:38
  • If you do a `var_dump` should give you a SimpleXMLElement in either case (comment node present or not). If you want a string, then as in any SimpleXMLElement you need to case it to a string - `var_dump((string)$xmlDataTest->comment);` – Nigel Ren Jan 04 '21 at 11:51
  • *As it is now, the XMP parsing behaves differently depending on whether the XML contains true comments or not* - if this is the case, can you show your actual XML parsing code please? `var_dump` is just for displaying debug output, and if you're relying on it to inform how you should parse the XML, it might not be helpful. – iainn Jan 04 '21 at 12:05
  • @iainn My XML contains 100's of nodes. I do "json_decode(json_encode(simplexmlelement));" in order to get a standard object, which I the traverse looking for certain content. Now for my payload-comment (the important comment), I ether get a String or an Array depending on whether someone puts in a code-comment or not. I can handle that, now that I stumbled across it. But it did waste me a lot of time, and I did find it surprising. I would almost call it a bug – j3App Jan 04 '21 at 12:18
  • 2
    Converting it to JSON and back is not helping - all you're doing is removing information specific to an XML document, **like comments**. A SimpleXMLElement is already an object that you can traverse, with its own API. If you use it, you'll avoid issues like this. – iainn Jan 04 '21 at 12:20

1 Answers1

2

This is only a debug output. If you access the value the comment will be ignored:

$root = simplexml_load_string($testXml);
var_dump((string)$root->comment);

foreach ($root->comment as $element) {
    var_dump((string)$element);
}

Output:

string(55) "this comment is part of my payload and should be parsed"
string(55) "this comment is part of my payload and should be parsed"

However if you want to be explicit you could switch to DOM+Xpath. It allows for specific node handling.

$document = new DOMDocument();
$document->loadXML($testXml);
$xpath = new DOMXpath($document);

var_dump(
    [
        'element node' => $xpath->evaluate('string(/root/comment)'),
        'comment node' => $xpath->evaluate('string(/root/comment())')
    ]
);

Output:

array(2) {
  ["element node"]=>
  string(55) "this comment is part of my payload and should be parsed"
  ["comment node"]=>
  string(34) " this comment should not be parsed"
}
ThW
  • 19,120
  • 3
  • 22
  • 44