In C# it appears that Grüsse
and Grüße
are considered equal in most circumstances as is explained by this nice webpage. I'm trying to find a similar behavior in Java - obviously not in java.lang.String
.
I thought I was in luck with java.regex.Pattern
in combination with Pattern.UNICODE_CASE
. The Javadoc says:
UNICODE_CASE enables Unicode-aware case folding. When this flag is specified then case-insensitive matching, when enabled by the CASE_INSENSITIVE flag, is done in a manner consistent with the Unicode Standard.
Yet the following code:
Pattern p = Pattern.compile(Pattern.quote("Grüsse"),
Pattern.UNICODE_CASE | Pattern.CASE_INSENSITIVE);
System.out.println(p.matcher("Grüße").matches());
yields false
. Why? And is there an alternative way of reproducing the C# case folding behavior?
---- edit ----
As @VGR pointed out, String.toUpperCase
will convert ß
to ss
, which may or may not be case folding (maybe I'm confusing concepts here). However other characters in the German locale are not "folded", for instance ü
does not become UE
. So to make my initial example more complete, is there a way to make Grüße
and Gruesse
compare equal in Java?
I was thinking the java.text.Normalizer
class could be used to do just that, but it converts ü
to u?
rather than ue
. It also hasn't an option to provide a Locale
, which confuses me even more.