Java: Why String.compareIgnoreCase() uses both Character.toUpperCase() and Character.toLowerCase()?

Question

The compareToIgnoreCase method of String Class is implemented using the method in the snippet below(jdk1.8.0_45).

i. Why are both Character.toUpperCase(char) and Character.toLowerCase(char) used for comparison? Wouldn't either of them suffice the purpose of comparison?

ii. Why was s1.toLowerCase().compare(s2.toLowerCase()) not used to implement compareToIgnoreCase? - I understand the same logic can be implemented in different ways. But, still I would like to know if there are specific reasons to choose one over the other.

    public int compare(String s1, String s2) {
        int n1 = s1.length();
        int n2 = s2.length();
        int min = Math.min(n1, n2);
        for (int i = 0; i < min; i++) {
            char c1 = s1.charAt(i);
            char c2 = s2.charAt(i);
            if (c1 != c2) {
                c1 = Character.toUpperCase(c1);
                c2 = Character.toUpperCase(c2);
                if (c1 != c2) {
                    c1 = Character.toLowerCase(c1);
                    c2 = Character.toLowerCase(c2);
                    if (c1 != c2) {
                        // No overflow because of numeric promotion
                        return c1 - c2;
                    }
                }
            }
        }
        return n1 - n2;
    }

`s1.toLowerCase().compare(s2.toLowerCase())` requires the creation of two new `String` objects. The actual implementation creates no additional objects. — Andy Turner, Aug 22 '16 at 16:04
I'm guessing there's some particularly weird Unicode corner case out there. — chrylis -cautiouslyoptimistic-, Aug 22 '16 at 16:06
http://stackoverflow.com/a/16083264/3788176 suggests "in Turkish Locale there are two different uppercase "i" letters". — Andy Turner, Aug 22 '16 at 16:07
@AndyTurner But there are also corresponding lowercase "i"s. — chrylis -cautiouslyoptimistic-, Aug 22 '16 at 16:09
Thanks Andy. I was unaware that an alphabet might have multiple uppercase characters. That also explains why uppercase characters are checked before lowercase characters! — Sendhilkumar Alalasundaram, Aug 22 '16 at 16:10
I considered this as a duplicate of the linked question, and/or the one that the linked one was a duplicate of ( http://stackoverflow.com/questions/15518731/understanding-logic-in-caseinsensitivecomparator ). In any case, the cruicial points are already summarized elsewhere. — Marco13, Aug 22 '16 at 16:14
Isn't it also because some characters, like the German ß (scharfes s) has no uppercase? — biziclop, Aug 22 '16 at 16:14

score 7 · Accepted Answer · answered Aug 22 '16 at 16:10

7

Here's an example using Turkish i's:

System.out.println(Character.toUpperCase('i') == Character.toUpperCase('İ'));
System.out.println(Character.toLowerCase('i') == Character.toLowerCase('İ'));

The first line prints false; the second true. Ideone demo.

answered Aug 22 '16 at 16:10

Andy Turner

137,514
11
162
243

Davide Lorenzo MARINO · Answer 2 · 2016-08-22T16:26:26.267

There are languages that have special characters that are converted to an upper or lower character (or sequence of characters).

So using only one case can have some problem for this kind of special characters.

As an example the character Eszett ß in german is converted to SS in upper case. From wikipedia:

The name eszett comes from the two letters S and Z as they are pronounced in German. Its Unicode encoding is U+00DF.

So a word like groß compared to gross will generate a failure if only lower comparison is used.

@chrylis here is a working example

    System.out.println("ß".toUpperCase().equals("SS"));  // True
    System.out.println("ß".toLowerCase().equals("ss"));  // false

Thanks to the comment of @chrylis i made some additional test and I found a possible error on String class:

    System.out.println("ß".toUpperCase().equals("SS"));  // True
    System.out.println("ß".toLowerCase().equals("ss"));  // false

    but

    System.out.println("ß".equalsIgnoreCase("SS"));  // False
    System.out.println("ß".equalsIgnoreCase("ss"));  // False

So there is at least a case where two strings are equals if manually both converted to uppercase, but are not equals if are compared ignoring case.

That specific case doesn't work in Java, as `SS` is two characters. — chrylis -cautiouslyoptimistic-, Aug 22 '16 at 16:10
As comparison above is done by characters, ß will not be compared with 'SS'. So, the above equality condition will be true only when both the characters are 'ß'. How will it have an effect in the above case? — Sendhilkumar Alalasundaram, Aug 22 '16 at 16:25

score 2 · Answer 3 · answered Aug 22 '16 at 16:10

From the Character class documentation, in the toUpperCase and the toLowerCase methods it states:

Note that Character.isUpperCase(Character.toUpperCase(ch)) does not
always return true for some ranges of characters, particularly those 
that are symbols or ideographs. (Similar for toLowerCase)

Since there is the potential for anomalies in the comparison, they check for both cases to give the most accurate response possible.

Java: Why String.compareIgnoreCase() uses both Character.toUpperCase() and Character.toLowerCase()?

3 Answers3