2

I would like that collator sorts strings even if accents are not defined for requested locale.

public static void main(String[] args)
{
    List<String> strings = new LinkedList<>();
    strings.add("Zverina");
    strings.add("Zulu");
    strings.add("Žurerka");
    // This is correct order for Slovenian locale

    Collections.sort(strings, new MyComparator(Locale.forLanguageTag("en-GB")));
    System.out.println(strings);

    Collections.sort(strings, new MyComparator(Locale.forLanguageTag("sl-SI")));
    System.out.println(strings);
}

private static class MyComparator implements Comparator<String>
{
    private Collator collator;

    public MyComparator(Locale locale)
    {
        collator = Collator.getInstance(locale);
    }

    @Override
    public int compare(String s1, String s2)
    {
        return collator.compare(s1, s2);
    }
}

Code above sorts list to [Zulu, Žurerka, Zverina] and [Zulu, Zverina, Žurerka]. I would like to have equal (second) result if I use en-GB locale.

For example if Z and Ž are treated as equal for en-GB locale I would like to specify a fallback locale to get rules from (sl-SI in this case).

I tried to play with Collators strength and decomposition parameters without any success.

Anze Rehar
  • 323
  • 3
  • 10
  • have you considered something like this ? http://stackoverflow.com/a/2774370/6894338, then just do a check that if either string contains an accented char, then compare "sl-SI" else normal compare – Ash Oct 13 '16 at 12:21
  • I think the sl-SI Collator might be doing things correctly. All online resources seem to indicate that Ž is a separate letter that comes after Z in the Slovak alphabet. See, for instance, https://en.wikipedia.org/wiki/Slovak_orthography . – VGR Oct 13 '16 at 16:56
  • @AshFrench: I don't want to hardcode and limit myself to few accents. Since we support many other (Arabic, Persian) locales with their own specialities this wouldn't be a general solution. – Anze Rehar Oct 13 '16 at 20:05
  • @VGR: True, sl-SI Ž after Z is ok, Ž is completely different letter. I just need to correct you that this is a locale for Slovenian, not Slovak :) – Anze Rehar Oct 13 '16 at 20:08
  • My apologies. If SO permitted editing of old comments, I’d change the link to https://en.wikipedia.org/wiki/Slovene_alphabet . – VGR Oct 13 '16 at 20:16

0 Answers0