Because char
is a number. It's an unsigned 16-bit number: A value between 0 and 65535 (inclusive).
Let's try it:
char c = 'a';
int y = c; // this.. works?
System.out.println(y); // this prints.. 97?
System.out.println(c); // Phew, this prints 'a' at least
System.out.println((int) 'a'); // also 97, and.. compiles?
So where's this 97 coming from?
The unicode table. Computers are, in the end, bit based. With bits we can represent numbers, and with numbers we can represent characters. Which number represents which character? Well, you tell me. In the olden days the answered depended on the country you bought that computer from, or the language you configured your OS to be when you installed it. Mostly because it was too inefficient to attempt to represent characters with anything but a byte, and bytes only cover 0-255 - there are more characters in use on the planet than that, so the german computers had a number for the ü character, and the icelandic ones for the ∂, and the turkish ones for the dotless i, and so on. The russian one was completely different (cyrillic), now imagine chinese and japanese ones.
Unicode fixes this by having one table for all of it. Naturally, the table is much larger than 256 entries.
97 is the unicode table id for the 'a' character.
The unicode value for the character '0' is 48. Fortunately, 1 is 49, and so on, so if you want to translate '5'
to 5
, subtract 48. Which is hard to remember, except... '0'
is 48, just like 0x10
and 16
and 020
are all the same number, just in different writing styles.
So you can just write:
int v = '5' - '0';
System.out.println(v); // Prints 5!
So why does System.out.println('a')
not print 97?
Because println
is coded like that. println
is overloaded: There are many methods all named println
, in java the paramtype is part of the method name effectively. So you get the char
version of it, and it looks the number up in the unicode table and then prints that. You're still passing 97 to the method. It's just that the method reacts to being passed 97 by printing 'a', not by printing 97 (which is what the println
variant that takes an int
does).
So why can I return it if my method return type int
?
Because java has silent widening, as per the spec. Anytime you use a numeric type A when what you actually need is numeric type B, but A is 'smaller' than B (B can represent everything A can, and more), then it is not an error; java simply assumes you meant to convert it and injects that conversion for you:
byte b = 10;
int c = b; // legal.
int c = (int) b; // de-syntax-sugared
Given that char
represents 0-65535, and int
can represent from -2147483648 to +2147483647, every char
fits, and therefore:
char c = 'a'; // legal
c = 97; // so is this.
int d = c; // and so is this
The other way around doesn't:
char c = 'a';
byte b = c; // nope
byte b = (byte) c; // legal
Isn't unicode much larger than 65536 entries?
Yeah. A char represent one part of a surrogate pair. It means any character from the higher planes, like emoji, actually take up 2 char
values.