11

Is it safe to use ==/!= while comparing Character?

Character being a boxed type is it safe to use ==/!= while comparing Character types?

  public static void main(String[] args) {

        Character c1 = 'd';
        Character c2 = (char) getInt();

        System.out.println(c1 == c2);
    }

    public static int getInt() {

        return 100;
    }

The following works as expected (true). However, are there cases where comparing Character with same value using == would lead to false? (Hence, do we have to use '.equals()' while comparing boxed primitive types?

Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
user7858768
  • 838
  • 7
  • 22
  • 6
    Trivially `new Character('a') == new Character('a')` is false. Yes you should use `equals` when comparing reference types. – Sweeper Oct 08 '21 at 17:19
  • 2
    Does this answer your question? [Why is 128==128 false but 127==127 is true when comparing Integer wrappers in Java?](https://stackoverflow.com/questions/1700081/why-is-128-128-false-but-127-127-is-true-when-comparing-integer-wrappers-in-ja) – Progman Oct 08 '21 at 17:20
  • 1
    It is never a good idea to use `==` with objects. – Mark Rotteveel Oct 08 '21 at 17:21

2 Answers2

15

No, it's not safe. You must use equals().

Demonstration:

System.out.println(Character.valueOf('Ü') == Character.valueOf('Ü'));
// -> false

Note that if you use autoboxing or Character.valueOf(), then some characters (ASCII characters) are cached and the same Character instance is reused, so == may return true for the same value:

System.out.println(Character.valueOf('A') == Character.valueOf('A'));
// -> true (on my machine)

But it doesn't work for all characters, and it won't work if you call the deprecated new Character(...) explicitly.

Alex Shesterov
  • 26,085
  • 12
  • 82
  • 103
  • Quote from `java.lang.Character#valueOf(char)` comment: "This method will always cache values in the range `\u0000` to `\u007F`, inclusive, and may cache other values outside of this range". The code itself, however, does not include any "other values". – Vasily Liaskovsky Oct 08 '21 at 17:26
1

tl;dr

Use code points, not char/Character.

"d".codePointAt( 0 ) == 100  // true.

Details

The Answer by Alex Shesterov is correct. But bigger picture, you should not be using Character objects.

Character is broken

The Character class is a wrapper class for the primitive type char. The char/Character type is legacy as of Java 2, and is essentially broken. As a 16-bit value, it is physically incapable of representing most characters.

For example, try running:

 System.out.println( Character.valueOf( '' ) ) ;

Code points

Instead, when working with individual characters, use code point integer numbers. In Java that means using the int/Integer type.

If you look around classes such as String, StringBuilder, and Character you will find codePoint methods.

Let's revise your code snippet. We will change the names to be more descriptive. We switch out Character and char usage for mere int primitive integers. As such, we can compare our int values using == or !=.

package work.basil.text;

public class App7
{
    public static void main ( String[] args )
    {
        int codePointOf_LATIN_SMALL_LETTER_D = "d".codePointAt( 0 ); // Annoying zero-based index counting, not ordinal.
        int codePoint2 = getInt();

        boolean sameCharacter = ( codePointOf_LATIN_SMALL_LETTER_D == codePoint2 );  // Comparing `int` primitives with double-equals. 
        System.out.println( sameCharacter );
    }

    public static int getInt ()
    {
        return 100;  // Code point 100 is LATIN SMALL LETTER D, `d`. 
    }
}

When run:

true

Of course, if you use auto-boxing or otherwise mix the wrapper class Integer with the primitive int, then the same explanation in that other Answer applies here too.

Basil Bourque
  • 303,325
  • 100
  • 852
  • 1,154
  • There is nothing broken about `char` and `Character`. The legacy of Java 2 is that they are 16-bit. If Java were to be designed today, they would be 8-bit as in Go. In that case Java Strings would be UTF-8 encoded instead of UTF-16 encoded. UTF-8 encoding is more compact than UTF-16. – Alexey Veleshko Dec 19 '21 at 10:23
  • The concern for memory efficiency even made Java developers devise an alternative internal representation for the String class. Traditionally it was always UTF-16 but now String will try to use an 8-bit Latin-1 encoding if possible. – Alexey Veleshko Dec 19 '21 at 10:26