12

I'd like to write unicode literal U+10428 in Java. http://www.marathon-studios.com/unicode/U10428/Deseret_Small_Letter_Long_I

I tried with '\u10428' and it doesn't compile.

hippietrail
  • 15,848
  • 18
  • 99
  • 158
kawty
  • 1,656
  • 15
  • 22

1 Answers1

22

Because Java went full-out unicode when people thought 64K are enough for everyone (Where did one hear such before?), they started out with UCS-2 and later upgraded to UTF-16.

But they never bothered to add an escape sequence for unicode characters outside the BMP.

Thus, your only recourse is manually recoding to a UTF-16 surrogate-pair and using two UTF-16 escapes.

Your example codepoint U+10428 is "\uD801\uDC28".

I used this site for the recoding: https://rishida.net/tools/conversion/

Quote from the docs:

3.10.5 String Literals

A string literal consists of zero or more characters enclosed in double quotes. Characters may be represented by escape sequences (§3.10.6) - one escape sequence for characters in the range U+0000 to U+FFFF, two escape sequences for the UTF-16 surrogate code units of characters in the range U+010000 to U+10FFFF.

Deduplicator
  • 44,692
  • 7
  • 66
  • 118
  • 5
    In Java, it cannot. Anyway, be careful of "one character": Depending on context (which is sometimes absent or too ambiguous), it can mean anything of byte, codeunit, codepoint and grapheme. – Deduplicator Jul 08 '14 at 18:16