Could you please provide me more information about why does unicode is selected as default value for char data type? is there any specific reason behind this?
It was recognized that language that was to become Java needed to support multilingual character sets by default. At that time Unicode was the new standard way of doing it1. When Java first adopted Unicode, Unicode used 16 bit codes exclusively. That caused the Java designers to specify char
as an unsigned 16 bit integral type. Unfortunately, Unicode rapidly expanded beyond a 16 bits, and Java had to adapt ... by switching to UTF-16 as Java's native in-memory text encoding scheme.
For more background:
But note that:
- In the latest version of Java, you have the option enabling a more compact representation for text data.
- The width of
char
is so hard-wired that it would be impossible to change. In fact, if you want to represent a Unicode code point, you should use an int
rather than a char
.
1 - It is still the standard way. AFAIK there are no credible alternatives to Unicode at this time.
The specific reason that \u0000
was chosen as the default initial value for char
, is because it is zero. Objects are default initialized by writing all zero bytes to all fields irrespective of their types. This maps to zero for integral types and floating point types, false
for boolean
, and null
for reference types.
It so happens that the \u0000
character maps to the ASCII NUL
control character which is a non-printing character.