7

I've found a rather strange thing for me while working with Java. Maybe it's an ordinary thing, but i don't understand why it works this way.

I have a code like this:

Character x = 'B';
Object o = x;
System.out.println(o == 'B');

It works fine and the output is "true". Then I change the english B to slavic B (Б):

Character x = 'Б';
Object o = x;
System.out.println(o == 'Б');

Now the output is "false". How come? By the way, the output is still "true" if i compare the x variable with 'Б' directly, but when I do it through an Object it works differently.

Can anyone, please, explain this behaviour?

user2452103
  • 199
  • 5

2 Answers2

8

Without boxing - using just char - you'd be fine. Likewise if you use equals instead of ==, you'd be fine. The problem is that you're comparing references for boxed values using ==, which just checks for reference identity. You're seeing a difference because of the way auto-boxing works. You can see the same thing with Integer:

Object x = 0;
Object y = 0;
System.out.println(x == y); // Guaranteed to be true

Object x = 10000;
Object y = 10000;
System.out.println(x == y); // *May* be true

Basically "small" values have cached boxed representations, whereas "larger" values may not.

From JLS 5.1.7:

If the value p being boxed is an integer literal of type int between -128 and 127 inclusive (§3.10.1), or the boolean literal true or false (§3.10.3), or a character literal between '\u0000' and '\u007f' inclusive (§3.10.4), then let a and b be the results of any two boxing conversions of p. It is always the case that a == b.

Ideally, boxing a primitive value would always yield an identical reference. In practice, this may not be feasible using existing implementation techniques. The rule above is a pragmatic compromise, requiring that certain common values always be boxed into indistinguishable objects. The implementation may cache these, lazily or eagerly. For other values, the rule disallows any assumptions about the identity of the boxed values on the programmer's part. This allows (but does not require) sharing of some or all of these references. Notice that integer literals of type long are allowed, but not required, to be shared.

This ensures that in most common cases, the behavior will be the desired one, without imposing an undue performance penalty, especially on small devices. Less memory-limited implementations might, for example, cache all char and short values, as well as int and long values in the range of -32K to +32K.

The part about "a character literal between \u0000 and \u007f`" guarantees that boxed ASCII characters will be cached, but not non-ASCII boxed characters.

Community
  • 1
  • 1
Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
2

when you do

Character x = 'B' 

it invokes Character.valueOf(C)

2: invokestatic  #16                 // Method java/lang/Character.valueOf:(C)Ljava/lang/Character;

which caches

This method will always cache values in the range '\u0000' to '\u007F', inclusive, and may cache other values outside of this range.

public static Character valueOf(char c) {
    if(c <= 127) { // must cache
        return CharacterCache.cache[(int)c];
    }
    return new Character(c);
}

Similar

Community
  • 1
  • 1
jmj
  • 237,923
  • 42
  • 401
  • 438