45

I have an ASP.NET MVC action that is returning a JSON object.

The JSON:

{status: "1", message:"", output:"<div class="c1"><div class="c2">User generated text, so can be anything</div></div>"}

Currently my HTML is breaking it. There will be user generated text in the output field, so I have to make sure I escape ALL things that need to be escaped.

Does someone have a list of all things I need to escape for?

I'm not using any JSON libraries, just building the string myself.

GEOCHET
  • 21,119
  • 15
  • 74
  • 98
Blankman
  • 259,732
  • 324
  • 769
  • 1,199

6 Answers6

75

Take a look at http://json.org/. It claims a bit different list of escaped characters than Chris proposed.

\"
\\
\/
\b
\f
\n
\r
\t
\u four-hex-digits
Pashec
  • 23,199
  • 3
  • 26
  • 26
  • 7
    Except that it's completely unclear which characters should be encoded with \uxxxx sequence... – Pawel Veselov Mar 28 '13 at 08:04
  • 1
    And sort-of unclear what most of the others mean... (had to scroll back up to @ChrisNielsen's answer since I didn't recognize `\f`) – Izkata Sep 10 '13 at 21:28
  • `\uXXXX` escape codes specify a code point in the Basic Multilingual Plane (U+0000 through U+FFFF). See the official specification ["ECMA-404 The JSON Data Interchange Standard"](http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf) for more information. – ComFreek Apr 25 '14 at 14:20
  • Why aren't curly brackets escaped? If this is a json file and any key/value contains curly brackets then wouldn't it make the json structure invalid? – Mugen May 10 '16 at 12:01
  • 1
    @Mugen: No, a int or a bool or a float can't contain { or }. And a string is always between double-quotes, so no confusion there, assuming of course your parser works correctly. – Stefan Steiger May 12 '16 at 07:26
  • @StefanSteiger Oh that explains the escape sequence for the double quotes. Thanks for explaining. :) – Mugen May 14 '16 at 13:53
55

Here is a list of special characters that you can escape when creating a string literal for JSON:

\b  Backspace (ASCII code 08)
\f  Form feed (ASCII code 0C)
\n  New line
\r  Carriage return
\t  Tab
\v  Vertical tab
\'  Apostrophe or single quote
\"  Double quote
\\  Backslash character

Reference: String literals

Some of these are more optional than others. For instance, your string should be perfectly valid whether you escape the tab character or leave in a tab literal. You should certainly be handling the backslash and quote characters, though.

bitsoflogic
  • 1,164
  • 2
  • 12
  • 28
Chris Nielsen
  • 14,731
  • 7
  • 48
  • 54
  • 3
    Escaping `/` is a good idea, too. At least when it's part of ``. – ThiefMaster Oct 18 '11 at 23:40
  • 5
    These are that I *can* escape, which of them are that I *must* escape? – deerchao Jan 31 '12 at 07:33
  • 1
    And you must escape Tab when it's inside quotes, jsonlint.com says so, and jquery.parseJSON says so. – deerchao Jan 31 '12 at 07:36
  • this list is wrong. escaping ' would yield to an invalid JSON – Dexter Legaspi Jun 15 '12 at 12:45
  • Quite right, @DexterLegaspi, although I had not realized that until you pointed it out. Pashec's answer appears to be the most correct. – Chris Nielsen Jun 15 '12 at 14:14
  • 5
    -1 **This answer is wrong!** The reference you are using documents JavaScript's escape codes (whereas the OP asks about JSON escape codes). You can find the official list of escape codes for JSON on http://www.json.org/. While both lists of do have intersections, they are not identical. For example, `\'` is not a valid JSON escape code and causes validation errors when using [JSONLint](http://jsonlint.com) – ComFreek Apr 25 '14 at 14:17
16

As explained in the section 9 of the official ECMA specification (http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf) in JSON, the following chars have to be escaped:

  • U+0022 (", the quotation mark)
  • U+005C (\, the backslash or reverse solidus)
  • U+0000 to U+001F (the ASCII control characters)

In addition, in order to safely embed JSON in HTML, the following chars have to be also escaped:

  • U+002F (/)
  • U+0027 (')
  • U+003C (<)
  • U+003E (>)
  • U+0026 (&)
  • U+0085 (Next Line)
  • U+2028 (Line Separator)
  • U+2029 (Paragraph Separator)

Some of the above characters can be escaped with the following short escape sequences defined in the standard:

  • \" represents the quotation mark character (U+0022).
  • \\ represents the reverse solidus character (U+005C).
  • \/ represents the solidus character (U+002F).
  • \b represents the backspace character (U+0008).
  • \f represents the form feed character (U+000C).
  • \n represents the line feed character (U+000A).
  • \r represents the carriage return character (U+000D).
  • \t represents the character tabulation character (U+0009).

The other characters which need to be escaped will use the \uXXXX notation, that is \u followed by the four hexadecimal digits that encode the code point.

The \uXXXX can be also used instead of the short escape sequence, or to optionally escape any other character from the Basic Multilingual Plane (BMP).

Andrei Bozantan
  • 3,781
  • 2
  • 30
  • 40
5

From the spec:

All characters may be placed within the quotation marks except for the characters that must be escaped: quotation mark (U+0022), reverse solidus [backslash] (U+005C), and the control characters U+0000 to U+001F

Just because e.g. Bell (U+0007) doesn't have a single-character escape code does not mean that you don't need to escape it. Use the Unicode escape sequence \u0007.

Kevin Smyth
  • 1,892
  • 21
  • 22
  • Both your ECMA spec and RFC 4627 state the solidus must be escaped, then both go on to give examples where they are not! That's so confusing. Yours: `"/"` RFC: `"Url": "http://www.example.com/image/481989943",` Some comments I've seen suggest solidus _only_ needs to be escaped when following `<`, e.g.: `<\/tag>`, but if that's the case, why are the specs so aloof about it? Based on the RFC, I'm escaping the solidus, but I have been asked about it by people who are not used to seeing it in their documents. So I'm trying to give an intelligent answer, not speculation. Thanks. – Luv2code Jul 07 '15 at 05:07
  • The specs state that the reverse solidus must be escaped. They do not state that the solidus must be escaped. i.e. both `"\/"` and `"/"` are legal – Kevin Smyth Jul 07 '15 at 16:46
5

Right away, I can tell that at least the double quotes in the HTML tags are gonna be a problem. Those are probably all you'll need to escape for it to be valid JSON; just replace

"

with

\"

As for outputting user-input text, you do need to make sure you run it through HttpUtility.HtmlEncode() to avoid XSS attacks and to make sure that it doesn't screw up the formatting of your page.

Jarett Millard
  • 5,802
  • 4
  • 41
  • 48
4

The JSON reference states:

 any-Unicode-character-
     except-"-or-\\-or-
     control-character

Then lists the standard escape codes:

  \" Standard JSON quote
  \\ Backslash (Escape char)
  \/ Forward slash
  \b Backspace (ascii code 08)
  \f Form feed (ascii code 0C)
  \n Newline
  \r Carriage return
  \t Horizontal Tab
  \u four-hex-digits

From this I assumed that I needed to escape all the listed ones and all the other ones are optional. You can choose to encode all characters into \uXXXX if you so wished, or you could only do any non-printable 7-bit ASCII characters or characters with Unicode value not in \u0020 <= x <= \u007E range (32 - 126). Preferably do the standard characters first for shorter escape codes and thus better readability and performance.

Additionally you can read point 2.5 (Strings) from RFC 4627.

You may (or may not) want to (further) escape other characters depending on where you embed that JSON string, but that is outside the scope of this question.

Marius
  • 3,372
  • 1
  • 30
  • 36