3

Someone asked a similar question. But I didnt really get the answer.

when I say char myChar = 'k' in java its going to reserve 16 bits for it (according to java docs below?

http://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html

Now lets say I have a unicode character '電' and assume that its code point is something like U+FFFF1. This code point could not be stored in 2 bytes and so would java allocate extra bytes (UTF16 based string) for it?

In short when I have something like this -

char myChar =  '電'

Assuming that its code point representation is long and will require more than 2 bytes.

How many bits will myChar have - 16 or 32

Thanks

Community
  • 1
  • 1
Tintin
  • 2,853
  • 6
  • 42
  • 74
  • If you didn’t get the answer, ask for clarification there. Don’t post duplicates; they make it more difficult to find good answers to questions that have been asked, when each clone exists independently of the others, with its own set of answers. – Jukka K. Korpela Sep 02 '14 at 21:37
  • I agree! but in this case, that question was 4 years old (with an anwer already selected) and actually did not talk about code points. – Tintin Sep 02 '14 at 21:43
  • The accepted answer to the old question starts with “Java Strings are UTF-16 (big endian), so a Unicode code point can be one or two characters”. If a clarification is needed, it should be made to the answers to an existing question. – Jukka K. Korpela Sep 03 '14 at 04:56

1 Answers1

4

Jave uses UTF-16, and yes every Java char is 16-bits. From the Java Tutorial - Primitive Data Types,

char: The char data type is a single 16-bit Unicode character. It has a minimum value of '\u0000' (or 0) and a maximum value of '\uffff' (or 65,535 inclusive).

Further, the Character Javadoc says (in part),

The methods that only accept a char value cannot support supplementary characters. They treat char values from the surrogate ranges as undefined characters. For example, Character.isLetter('\uD840') returns false, even though this specific value if followed by any low-surrogate value in a string would represent a letter.

The methods that accept an int value support all Unicode characters, including supplementary characters. For example, Character.isLetter(0x2F81A) returns true because the code point value represents a letter (a CJK ideograph).

So, supplementary characters (like your second example) aren't represented as a single 16-bit character.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
Elliott Frisch
  • 198,278
  • 20
  • 158
  • 249
  • Thanks! I did not know about supplementary character limitation with single char variables. – Tintin Sep 02 '14 at 20:36