1

Since Java holds characters internally in UTF-16, what if you need to output in a certain encoding that includes characters that are not in unicode at all?

mentics
  • 6,852
  • 5
  • 39
  • 93
  • 2
    All Unicode characters are encondeable in UTF-16 (and UTF-8, for that matter). – leonbloy Nov 19 '12 at 20:15
  • See also here: http://stackoverflow.com/questions/9699071/what-is-the-javas-internal-represention-for-string-modified-utf-8-utf-16 – leonbloy Nov 19 '12 at 20:19

1 Answers1

1

Java can only handle characters which are present in Unicode, basically. Text outside the BMP (i.e. above U+FFFF) is encoded as surrogate pairs (as each char is a UTF-16 code unit)... but if you want characters which aren't in Unicode at all, you're on your own - you could probably find some area of Unicode which is reserved for private use, and map the characters there... but you may well have "fun" in all kinds of odd ways.

Do you definitely need to handle characters which aren't in Unicode? I thought it covered almost everything these days...

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • Perhaps he/she believes that UTF-16 = BMP ? – leonbloy Nov 19 '12 at 20:18
  • My impression is that it covers ridiculously **more** than everything :) – Marko Topolnik Nov 19 '12 at 20:24
  • @leonboy: Possibly. That's why I mentioned the possibility of using surrogate pairs :) Maybe we'll get more details soon... – Jon Skeet Nov 19 '12 at 20:29
  • "if you want characters which aren't in Unicode at all..." yes, that's my question. Maybe limiting support to UTF-16 would be sufficient... but there's a whole lot of craziness out there so I was trying to find out what we should do in the off case someone was using some obscure encoding with a wacky character. – mentics Nov 20 '12 at 00:22
  • 1
    @taotree: Well that encoding wouldn't be fully supported by Java anyway. Do you have any *concrete* concerns - any specific situations which you *know* are required but not supported? If not, I suggest you don't worry about it. – Jon Skeet Nov 20 '12 at 06:52