5

I've stubmled across this behavior on PHP 5.6 (also identical in PHP 5.4 up to 7.0).

$note = new SimpleXMLElement('<Note></Note>');
$note->addChild("string0", 'just a string');
$note->addChild("string1", "abc\n\n\n");
$note->addChild("string2", "\tdef");
$note->addChild("string3", "\n\n\n");
$note->addChild("string4", "\t\n");

$json = json_encode($note, JSON_PRETTY_PRINT);

print($json);

Outputs:

{
    "string0": "just a string",
    "string1": "abc\n\n\n",
    "string2": "\tdef",
    "string3": {
        "0": "\n\n\n"
    },
    "string4": {
        "0": "\t\n"
    }
}

There must be a reason behind this behavior, I would like to understand. And also, if you know of a way to force it to behave the same way for strings of texts and whitespace I would appreciate you sharing your ideas!

Edit. Here's a snippet you can run: http://sandbox.onlinephpfunctions.com/code/d797623553c11b7a7648340880a92e98b19d1925

Vallieres
  • 859
  • 7
  • 19
  • I can't reproduce this running php 5.5.9. for me, string3 and string4 are just blank whitespace. however, curiously enough, the whitespace characters are being taken literal the same as your example for string1 and string2. – Jeff Puckett Jul 20 '16 at 17:05
  • Added the snipped in my question. – Vallieres Jul 20 '16 at 17:07
  • @JeffPuckettII you are right on 5.5, but most of 5.6 versions are producing the above result. And all versions of PHP 7 I could test. – Vallieres Jul 20 '16 at 17:08
  • I see, your question does say *"also identical in PHP **5.4 up to** 7.0"* – Jeff Puckett Jul 20 '16 at 17:12
  • If I don't use the `SimpleXMLElement` class, then I do get the same results when using an array instead `$note = [ "string0" => 'just a string', "string1" => "abc\n\n\n", "string2" => "\tdef", "string3" => "\n\n\n", "string4" => "\t\n" ];` – Jeff Puckett Jul 20 '16 at 17:14
  • Actually, all versions of `json_encode` do it this way. May have to check the EOL versions box, but going back to 5.2 (where it was introduced) they all act the same https://3v4l.org/IfBKK – Machavity Jul 20 '16 at 17:22
  • I did try a few of each dot release and noticed the same behavior. However, many 5.5 release do not behave the same. – Vallieres Jul 20 '16 at 18:49
  • 1
    What is the expected output? I recon you're wondering where does `"0"` node come from. – Salman A Jul 21 '16 at 10:55
  • Directly JSON encoding an XML document is always going to be tricky, because there's not a trivial mapping for every possible structure. And SimpleXML really isn't designed for that task, so isn't going to have considered all the edge cases. That doesn't exactly answer why it gives that output though. – IMSoP Aug 02 '16 at 16:39

1 Answers1

1

This comes from RFC 4627 (emphasis mine)

All Unicode characters may be placed within the quotation marks except for the characters that must be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F).

Newline(\n) is U+000A in UTF-8 so PHP dutifully converts it back to its respective JS equivalent

PHP uses this RFC for json_encode

PHP implements a superset of JSON as specified in the original » RFC 4627 - it will also encode and decode scalar types and NULL.

As I noted in the comments, all versions of PHP, going back to 5.2, do it this way(Demo)

Machavity
  • 30,841
  • 27
  • 92
  • 100
  • strange that this causes [decoding problems for others](http://stackoverflow.com/q/42068/4233593) when unescaped as `\n` instead of `\\n` – Jeff Puckett Jul 20 '16 at 17:52
  • I might not understand it, but how does this character encoding ends up in {"0": "\n\n\n"} form instead of "string" ? – Vallieres Jul 20 '16 at 18:51
  • 1
    @Vallieres I think that's due to the SimpleXML conversion. If you put it into an array like I did it doesn't do that. I shoved the SXML back in for kicks and reran it and got all sorts of wackiness https://3v4l.org/kKfrL – Machavity Jul 20 '16 at 18:55
  • Yes, exactly my problem. :( You're explanation is interesting but I'm wondering about the reasoning between SimpleXML -> JSON and possibly a way to fix this (without preg_replace'ng, of course). – Vallieres Jul 20 '16 at 18:59
  • The root of that problem is probably due to the fact that SXML is an object that contains other objects (and sometimes arrays). So if you have something come along and try to iterate it you get some really weird results. So I'm not surprised it grabbed an internal array and added it in. – Machavity Jul 20 '16 at 19:16
  • @Machavity Actually, it's kind of the opposite: the SimpleXML object *doesn't* contain other objects and arrays, it just produces them on demand. The actual representation internally is native to libxml2 and not like anything PHP knows about. – IMSoP Aug 02 '16 at 16:41