5

I have a list of unicode strings that I want to sort by first letter. The problem is that I don't know to set up java.text.Collator that it would treat similar letters as different.

This is what I get now:

  • Rokiškis
  • Šakiai
  • Salantai
  • Šeduva
  • Šiauliai
  • Šilalė
  • Skuodas
  • Tauragė
  • Telšiai

This is what I want to get (word beginning with Š should always go after S not looking to second letter):

  • Rokiškis
  • Salantai
  • Skuodas
  • Šakiai
  • Šeduva
  • Šiauliai
  • Šilalė
  • Tauragė
  • Telšiai
Rytis Alekna
  • 1,387
  • 7
  • 17

2 Answers2

1

We can create a class extends Collator and override the compare method there.

An example is here.

public class MyCollator extends Collator {

@Override
public int compare(String source, String target) {
    return source.compareTo(target);
}

@Override
public CollationKey getCollationKey(String source) {
    // TODO Auto-generated method stub
    return null;
}

@Override
public int hashCode() {
    // TODO Auto-generated method stub
    return 0;
}

}

Then we can use this newly added class to sort the String list, and it will display in a correct way.

Collator collator = new MyCollator();

Collections.sort(list, collator);

My Test Result is as follows:

  • Rokiškis
  • Salantai
  • Skuodas
  • Tauragė
  • Telšiai
  • Šakiai
  • Šeduva
  • Šiauliai
  • Šilalė

Note, in the result, Š is displaying after T, this is because "Š".compareTo("T")>1 is equal to true.

I believe you can put some logic in compare method to make Š displaying just after S, but before T.

The above code is complied and executed using JDK 1.5 version.

Use Collections.sort(list) directly; You will get the same result as I mentioned above.

Mengjun
  • 3,159
  • 1
  • 15
  • 21
  • Sorry but I don't want to write my own Collator rules, because according to docs Java 6 Collator supports locale of this language. So I want to know how solve this problem in a clean way. – Rytis Alekna Oct 30 '13 at 07:16
0

So I tested all variants of Collators strength and decomposition and nothing changed. What I found that by my locale ("lt_LT") such sorting that was given was actually grammaticaly correct.

Rytis Alekna
  • 1,387
  • 7
  • 17