We all know that using String's equals() method for equality comparison will fail miserably. Instead, one should use Collator, like this:
// we need to detect User Interface locale somehow
Locale uiLocale = Locale.forLanguageTag("da-DK");
// Setting up collator object
Collator collator = Collator.getInstance(uiLocale);
collator.setStrength(Collator.SECONDARY);
collator.setDecomposition(Collator.CANONICAL_DECOMPOSITION);
// strings for equality testing
String test1 = "USA lover Grækenland støtte";
String test2 = "USA lover graekenland støtte";
boolean result = collator.equals(test1, test2);
Now, this code works, that is result is true unless uiLocale is set to Danish. In such case it will yield false. I certainly understand why this happened: this is just because the method equals is implemented like this:
return compare(s1, s2) == Collator.Equal;
This method calls the one that is used for sorting and check if strings are the same. They are not, because Danish specific collation rules requires that æ to be sorted after (if I understand the result of compare method correctly) ae. However, these strings are really the same, with this strength both case differences and such compatibility characters (that's what its called) should be treated as equal.
To fix this, one would use RuleBasedCollator with specific set of rules that will work for the equality case.
Finally the question is: does anyone know where I can get such specific rules (not only for Danish, but for other languages as well), so that compatibility characters, ligatures, etc. be treated as equal (CLDR chart does not seem to contain such or I failed searching for it)?
Or maybe I want to do something stupid here, and I should really use simply UCA for equality comparison (any code sample, please)?