23

Does somebody knows how to order an array with words with special characters like accents?

Arrays.sort(anArray);

returns 'Albacete' before 'Álava', and I want 'Álava' before 'Albacete'.

Thanks a lot

Rustam
  • 6,485
  • 1
  • 25
  • 25
JLLMNCHR
  • 1,551
  • 5
  • 24
  • 50
  • 1
    You have to write your own `comparator` and pass it to `Arrays.sort()`. By default, Strings are sorted based on *Natural order*. – TheLostMind Oct 17 '14 at 08:13

3 Answers3

40

If you just want to sort the strings as if they didn't have the accents, you could use the following:

Collections.sort(strs, new Comparator<String>() {
    @Override
    public int compare(String o1, String o2) {
        o1 = Normalizer.normalize(o1, Normalizer.Form.NFD);
        o2 = Normalizer.normalize(o2, Normalizer.Form.NFD);
        return o1.compareTo(o2);
    }
});

Related question:

For more sophisticated use cases you will want to read up on java.text.Collator. Here's an example:

Collections.sort(strs, new Comparator<String>() {
    @Override
    public int compare(String o1, String o2) {
        Collator usCollator = Collator.getInstance(Locale.US);
        return usCollator.compare(o1, o2);
    }
});

If none of the predefined collation rules meet your needs, you can try using the java.text.RuleBasedCollator.

Community
  • 1
  • 1
aioobe
  • 413,195
  • 112
  • 811
  • 826
  • (Note that you can change from `Collections.sort` to `Arrays.sort` if you're working with arrays and not lists.) – aioobe Oct 17 '14 at 08:33
  • 1
    the `Normalizer.Form.NFD` very nicely sorted the french "accented E" to appear after the regular "E". Thanks !! – Someone Somewhere Sep 03 '18 at 14:50
  • FYI: I found a problem with Normalizer.Form.NFD ... the polish "Ł" is sorted after Z, it's supposed to be after "L". This surprised me a lot because all the other accented letters sorted fine! I wonder if it's a bug. – Someone Somewhere Sep 03 '18 at 15:21
  • confirmed, it's a bug. To fix it, you could switch to https://github.com/gcardone/junidecode but I didn't want to make my app larger, so I simply wrote a utility function `return Normalizer.normalize(src, form).replaceAll("Ł","L");` – Someone Somewhere Sep 03 '18 at 16:01
  • @SomeoneSomewhere I know this is old, but would you mind sharing a link to the JDK bug with us? I was surprised to find this even with the newest JDK. – Petr Janeček Sep 08 '22 at 08:13
  • 1
    @PetrJaneček I'm sorry, but I don't have those notes any more. I faintly remember that polish character being the only issue – Someone Somewhere Sep 30 '22 at 00:47
1

You should take a look at RuleBasedCollator

RuleBasedCollator class is a concrete subclass of Collator that provides a simple, data-driven, table collator. With this class you can create a customized table-based Collator. RuleBasedCollator maps characters to sort keys.

RuleBasedCollator has the following restrictions for efficiency (other subclasses may be used for more complex languages) :

If a special collation rule controlled by a is specified it applies to the whole collator object. All non-mentioned characters are at the end of the collation order.

Ruchira Gayan Ranaweera
  • 34,993
  • 17
  • 75
  • 115
0

use a comparator like the below :) and sort your list

Comparator<String> accentIgnorantComparator = (o1, o2) -> {
    return StringUtils.stripAccents(o1).compareTo(StringUtils.stripAccents(o2));
};
Panos Nikolos
  • 333
  • 3
  • 11