Meaning of escaped unicode characters in JSON

Question

In JSON, Unicode characters can be escaped using the \uXXXX notation. I assume the XXXX obviously refers to a Unicode code point in hexadecimal.

But since there are only 4 digits, does this mean there is no way to escape codepoints which are > 0xFFFF?

Or does the \uXXXX not actually encode abstract code points, but actually units of UTF-16-BE encoded bytes?

JavaScript Unicode representation is sort-of broken. – Pointy Feb 24 '14 at 18:10 — Pointy, Feb 24 '14 at 18:10
[JavaScript has a Unicode problem.](http://mths.be/jsu) – Mathias Bynens Feb 25 '14 at 06:30 — Mathias Bynens, Feb 25 '14 at 06:30

Brett Zamir · Accepted Answer · 2014-02-24T18:33:28.273

2

It should be \uXXXX and yes, it is possible to represent characters greater than 0xFFFF using high and low surrogates along the lines you mention.

var s = '\uD87E\uDC04';
alert(s + '::' + s.length); // ::2

edited Feb 24 '14 at 18:33

answered Feb 24 '14 at 18:18

Brett Zamir

So you are saying that the `\uXXXX` notation is actually UTF-16 specifically – Siler Feb 24 '14 at 18:18
1

You might see http://stackoverflow.com/questions/8715980/javascript-strings-utf-16-vs-ucs-2 for the latter question. `charAt()`, for example, won't grab a whole abstract code point, so in that sense it may seem pre-UTF-16, but with surrogate support, JS can produce the necessary characters. How things are encoded internally (which may or may not be UTF16)--or in the document (which could be UTF-8, etc.)--are different matters from how the JS API works. – Brett Zamir Feb 24 '14 at 18:32

1 Answers1