1

While writing a JSON parser in Java I ran into a "cosmetic" problem:

In the JSON specification it's clearly said that Javascript control characters are the same as e.g. in C and Java, like \n or \t. The problem I was running into, is that when there are control codes within a JSON string (so within the quotes: "property":"value"), then the displayed JSON code is messed up because the control characters are changing the print, e.g. \n creates a new line or \t creates a tab.

An example:

String s = "{\n\t\"property1\": \"The quick brown fox\njumps over the lazy dog\",\n\t\"property2\":\"value2\"\n}"

Printing as:

{ "property1": "The quick brown fox jumps over the lazy dog", "property2": "value2" }

The solution would look like this:

String s = "{\n\t\"property1\": \"The quick brown fox\\njumps over the lazy dog\",\n\t\"property2\": \"value2\"\n}"

Printing "correctly" as:

{ "property1": "The quick brown fox\njumps over the lazy dog", "property2": "value2" }

So my question: Is it correct to treat control code outside strings differently than the control code within strings? And is it correct to add within JSON strings another backslash \ before any control characters, creating strings like "\n" or "\t" that won't have any effect on the look of JSON strings?

Marcus
  • 1,222
  • 2
  • 13
  • 22
  • 2
    Err, why don't you use a JSON library? – fge Apr 16 '14 at 14:24
  • 2
    Why are people building own cars instead of buying one? I don't know, do you?...Well, actually because there is no car that can do what I want. ;) – Marcus Apr 16 '14 at 14:27
  • 1
    And what is it that you want which _prevents you_ from using a JSON library? – fge Apr 16 '14 at 14:28
  • 2
    Simplicity, speed, all kind of stuff...But I'm not here to discuss my question rather than solutions to it. – Marcus Apr 16 '14 at 14:31
  • 1
    Well in this case you should, really; unless you intend to write a string escaper of your own, you _should_ be using a JSON library which is sure to do the correct thing; Jackson, for instance, is plenty fast, and you'll have no errors writing JSON in strings by hand (hint: `JsonNodeFactory`) – fge Apr 16 '14 at 14:32
  • 2
    No, I won't. The only thing I'll do is reverse engineer Jackson, but I hoped for a quick answer here. Well, then... – Marcus Apr 16 '14 at 14:41
  • 1
    Unless someone volunteers, it is highly unlikely that you'll get an answer for _reinventing the wheel_! There are TONS of JSON libraries out there; want a light one? Try json-simple. Or, well, read RFC 7159 and good luck. But _why bother_? – fge Apr 16 '14 at 15:06
  • 3
    It's not reinventing the wheel. It's building an own car. Actually, I read RFC 7159 and it's one of the worst RFC's I've ever read. And as I said: Above issues aren't adressed in the JSON standard. So there isn't even a functioning wheel, you see? – Marcus Apr 16 '14 at 15:18
  • Above issues are all addressed in the RFC. You should really read it again. In particular the grammars. – fge Apr 16 '14 at 15:24
  • @Marcus Can you show us how you used Jackson? If you tried to deserialize your first String, yes, it will fail since it is not valid JSON. I don't think it should fail on your second. – Sotirios Delimanolis Apr 16 '14 at 16:05
  • Yes, you're right, the second one works, thanks! What does this mean? Well, it means that my above solution is correct. And the RFC 7159 really does not adress the issue that \n is transformed into \\n in JSON strings. It's only writing about \n. – Marcus Apr 16 '14 at 16:13
  • @Marcus In section 7 of the [RFC](http://rfc7159.net/rfc7159) it states `except for the characters that must be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F).` and then lists those characters below. Is that what you're looking for? Remember, JSON string is **not** the same as a Java `String` literal. Other languages may have other means to escape the new line character. – Sotirios Delimanolis Apr 16 '14 at 16:17
  • Probably yes, but it's written badly and without any examples. However, here are some nice lines of code adressing this issue: http://stackoverflow.com/a/16652683/2012947 – Marcus Apr 16 '14 at 18:17
  • @fge - the majority of the JSON parsers are certainly not "sure to do the correct thing". See this: http://seriot.ch/parsing_json.php – GWR Apr 06 '17 at 00:27

1 Answers1

10

Is it correct to treat control code outside strings differently than the control code within strings?

The JSON specification states

A JSON text is a sequence of tokens. The set of tokens includes six structural characters, strings, numbers, and three literal names.

These are {, [, }, ], :, and ,. It then states

Insignificant whitespace is allowed before or after any of the six structural characters.

Your \n, \t and others (actually the spec defines 4 of them) are considered white space, so you can put as many of them as you want around the above characters.

There is no notion of control characters outside JSON strings. These are just whitespace characters. Yes, they are treated differently.

And is it correct to add within JSON strings another backslash \ before any control characters, creating strings like "\n" or "\t" that won't have any effect on the look of JSON strings?

In your example, you are writing String literals. If you literally want to write \n in the JSON string, you need to write \\n in the Java String literal and similarly for the other escape sequences. The JSON generator must find any whitespace in the Java String it is converting to a JSON string and escape it accordingly. The JSON parser must find the literal \n (or whatever else) in the JSON string it parses and convert it appropriately in the Java String it creates.

Community
  • 1
  • 1
Sotirios Delimanolis
  • 274,122
  • 60
  • 696
  • 724
  • 1
    Note: RFC 4627 is obsoleted by RFC 7159 – fge Apr 16 '14 at 15:24
  • To be honest, I know that chars outside strings can be ignored. My question actually adressed the issue that control chars like \n and \t within a JSON string affect the way a printout looks like. But thank you anyway. – Marcus Apr 16 '14 at 15:32
  • 1
    @Marcus Right, they must be escaped with a JSON string. The JSON spec does not allow them un-escaped inside a JSON string. – Sotirios Delimanolis Apr 16 '14 at 15:38