2

In JSON, Unicode characters can be escaped using the \uXXXX notation. I assume the XXXX obviously refers to a Unicode code point in hexadecimal.

But since there are only 4 digits, does this mean there is no way to escape codepoints which are > 0xFFFF?

Or does the \uXXXX not actually encode abstract code points, but actually units of UTF-16-BE encoded bytes?

Siler
  • 8,976
  • 11
  • 64
  • 124

1 Answers1

2

It should be \uXXXX and yes, it is possible to represent characters greater than 0xFFFF using high and low surrogates along the lines you mention.

var s = '\uD87E\uDC04';
alert(s + '::' + s.length); // ::2
Brett Zamir
  • 14,034
  • 6
  • 54
  • 77
  • So you are saying that the `\uXXXX` notation is actually UTF-16 specifically – Siler Feb 24 '14 at 18:18
  • 1
    You might see http://stackoverflow.com/questions/8715980/javascript-strings-utf-16-vs-ucs-2 for the latter question. `charAt()`, for example, won't grab a whole abstract code point, so in that sense it may seem pre-UTF-16, but with surrogate support, JS can produce the necessary characters. How things are encoded internally (which may or may not be UTF16)--or in the document (which could be UTF-8, etc.)--are different matters from how the JS API works. – Brett Zamir Feb 24 '14 at 18:32