Without boxing - using just char
- you'd be fine. Likewise if you use equals
instead of ==
, you'd be fine. The problem is that you're comparing references for boxed values using ==
, which just checks for reference identity. You're seeing a difference because of the way auto-boxing works. You can see the same thing with Integer
:
Object x = 0;
Object y = 0;
System.out.println(x == y); // Guaranteed to be true
Object x = 10000;
Object y = 10000;
System.out.println(x == y); // *May* be true
Basically "small" values have cached boxed representations, whereas "larger" values may not.
From JLS 5.1.7:
If the value p being boxed is an integer literal of type int
between -128 and 127 inclusive (§3.10.1), or the boolean literal true or false (§3.10.3), or a character literal between '\u0000' and '\u007f' inclusive (§3.10.4), then let a and b be the results of any two boxing conversions of p. It is always the case that a == b.
Ideally, boxing a primitive value would always yield an identical reference. In practice, this may not be feasible using existing implementation techniques. The rule above is a pragmatic compromise, requiring that certain common values always be boxed into indistinguishable objects. The implementation may cache these, lazily or eagerly. For other values, the rule disallows any assumptions about the identity of the boxed values on the programmer's part. This allows (but does not require) sharing of some or all of these references. Notice that integer literals of type long
are allowed, but not required, to be shared.
This ensures that in most common cases, the behavior will be the desired one, without imposing an undue performance penalty, especially on small devices. Less memory-limited implementations might, for example, cache all char
and short
values, as well as int
and long
values in the range of -32K to +32K.
The part about "a character literal between \u0000 and
\u007f`" guarantees that boxed ASCII characters will be cached, but not non-ASCII boxed characters.