6

The compareToIgnoreCase method of String Class is implemented using the method in the snippet below(jdk1.8.0_45).

i. Why are both Character.toUpperCase(char) and Character.toLowerCase(char) used for comparison? Wouldn't either of them suffice the purpose of comparison?

ii. Why was s1.toLowerCase().compare(s2.toLowerCase()) not used to implement compareToIgnoreCase? - I understand the same logic can be implemented in different ways. But, still I would like to know if there are specific reasons to choose one over the other.

    public int compare(String s1, String s2) {
        int n1 = s1.length();
        int n2 = s2.length();
        int min = Math.min(n1, n2);
        for (int i = 0; i < min; i++) {
            char c1 = s1.charAt(i);
            char c2 = s2.charAt(i);
            if (c1 != c2) {
                c1 = Character.toUpperCase(c1);
                c2 = Character.toUpperCase(c2);
                if (c1 != c2) {
                    c1 = Character.toLowerCase(c1);
                    c2 = Character.toLowerCase(c2);
                    if (c1 != c2) {
                        // No overflow because of numeric promotion
                        return c1 - c2;
                    }
                }
            }
        }
        return n1 - n2;
    }

3 Answers3

7

Here's an example using Turkish i's:

System.out.println(Character.toUpperCase('i') == Character.toUpperCase('İ'));
System.out.println(Character.toLowerCase('i') == Character.toLowerCase('İ'));

The first line prints false; the second true. Ideone demo.

Andy Turner
  • 137,514
  • 11
  • 162
  • 243
6

There are languages that have special characters that are converted to an upper or lower character (or sequence of characters).

So using only one case can have some problem for this kind of special characters.

As an example the character Eszett ß in german is converted to SS in upper case. From wikipedia:

The name eszett comes from the two letters S and Z as they are pronounced in German. Its Unicode encoding is U+00DF.

So a word like groß compared to gross will generate a failure if only lower comparison is used.


@chrylis here is a working example

    System.out.println("ß".toUpperCase().equals("SS"));  // True
    System.out.println("ß".toLowerCase().equals("ss"));  // false

Thanks to the comment of @chrylis i made some additional test and I found a possible error on String class:

    System.out.println("ß".toUpperCase().equals("SS"));  // True
    System.out.println("ß".toLowerCase().equals("ss"));  // false

    but

    System.out.println("ß".equalsIgnoreCase("SS"));  // False
    System.out.println("ß".equalsIgnoreCase("ss"));  // False

So there is at least a case where two strings are equals if manually both converted to uppercase, but are not equals if are compared ignoring case.

Davide Lorenzo MARINO
  • 26,420
  • 4
  • 39
  • 56
2

From the Character class documentation, in the toUpperCase and the toLowerCase methods it states:

Note that Character.isUpperCase(Character.toUpperCase(ch)) does not
always return true for some ranges of characters, particularly those 
that are symbols or ideographs. (Similar for toLowerCase)

Since there is the potential for anomalies in the comparison, they check for both cases to give the most accurate response possible.

Nick Allen
  • 917
  • 11
  • 24