17

my first question here :-)
Did my best reading the rules and searching if the question was already asked before.

The following code

    String[] strings = {"cAsE", "\u00df"};
    for (String str : strings) {
        System.out.println(str.equalsIgnoreCase(str.toLowerCase()));
        System.out.println(str.equalsIgnoreCase(str.toUpperCase()));
    }

outputs true 3 times (cAsE = case; cAsE = CASE; ß = ß) but also 1 false (ß != SS). Tried using toLowerCase(Locale) but it did't help.

Is this a known issue?

targumon
  • 1,041
  • 12
  • 26
  • 1
    Michael Kaplan has written extensively about the German Sharp S character. Things have changed recently and I'd expect libraries to be playing some catch-up. Lots of good information here: http://blogs.msdn.com/michkap/archive/2008/05/15/8506679.aspx – Aidan Ryan Aug 26 '09 at 11:31

4 Answers4

11

Until recently, Unicode didn't define an uppercase version of s-sharp. I'm not sure whether the latest Java 7 version does already include this new character and whether it handles it correctly. I suggest to give it a try.

The reason why str.toLowerCase() doesn't return the same as str.toUpperCase().toLowerCase() is that Java replaces ß with SS but there is no way to go back, so SS becomes ss and the compare fails.

So if you need to level the case, you must use str.toLowerCase(). If not, then simply calling equalsIgnoreCase() without any upper/lower conversion should work, too.

Aaron Digulla
  • 321,842
  • 108
  • 597
  • 820
  • 1
    Even if Java 7 supports the new Unicode character, "ß".toUpperCase() must still return "SS", since the upper-case "ß" is only of typographical interest and not really used in the wild: http://en.wikipedia.org/wiki/Capital_ß – Joachim Sauer Aug 26 '09 at 11:39
  • In my case I'm trying to match some users' strings with predefined ones (maybe I should have mentioned it in the original question...) So the code I gave here as an example is just a test I performed to understand why my original code didn't work as expected. Obviously the equalsIgnoreCase method exists to save us from changing the case of either strings. Anyway, the concept of "leveling" is what makes this my accepted answer :-) – targumon Aug 26 '09 at 12:38
2

Aaron Digulla has it. Also, it isn't meaningful to transform the string in the absence of locale data. In English, the upper case of i is I, but in Turkish it is İ. String.compareIgnoreCase does not take locale data into account.

(As an aside, you might want to look into normalization, or you'll end up wondering why "é".equals("é") can return false. Reason: one is a combining sequence.)

Community
  • 1
  • 1
McDowell
  • 107,573
  • 31
  • 204
  • 267
  • it is weird to me that the String class has 2 methods each for toLowerCase & toUpperCase (one without parameters + one that accepts a Locale) but only 1 method each for equalsIgnoreCase & compareToIgnoreCase if the guys at Sun think the *case methods should be Locale sensitive then I expect all of them to also accept it. Thanks for the normalization link, it's an overkill for my case, but insightful all the same. – targumon Aug 26 '09 at 12:22
  • @targumon: Note that in *all* locales `"ß".toUpperCase(locale)` returns `"SS"`, yet equalsIgnoreCase doesn't care. It's all broken somehow. – maaartinus Sep 13 '13 at 18:27
2

Unicode didn't define an uppercase version of s-sharp this is the exact point - in the german language there is no possibility of an sharp-s (ß) being a capital or the initial letter of any word. therefore its just non-sense arguing about a capital ß...

Gnark
  • 4,080
  • 7
  • 33
  • 44
0

Hm. I don't know anything about the German language, but I'm not sure how I feel about Unicode characters being treated as equivalent to some Roman-letter expansion. Should you be able to do the following?

myDictionary.put("glasses", new Bifocals());
myDictionary.get("glaßes");

If you have your druthers, myDictionary.get("glaßes") should return something the Bifocals from before. Is that legit?

John Feminella
  • 303,634
  • 46
  • 339
  • 357
  • 2
    "ß" and "ss" is not equivalent. "ss" is sometimes used to write "ß" when that letter is not available. Since there is no upper-case "ß" (ok, there is one, but it's mostly a typographical curiosity and not a letter that's used in reality) it will always be written as "SS" in ALL CAPS. The opposite is not true: "SS".toLower() is definitely "ss". – Joachim Sauer Aug 26 '09 at 11:37
  • Ah, gotcha. Thanks for the clarification, Joachim. – John Feminella Aug 26 '09 at 11:44