18

I became recently aware, that Java Collation seems to ignore spaces.

I have a list of the following terms:

Amman Jost 
Ammann Heinrich 
Ammanner Josef 
Bär Walter 
Bare Werner 
Barr Burt 
Barraud Maurice

The order above reflects the desired ordering for Germany, i.e. taking space into acount. However, Java Collation using

Collator collator = Collator.getInstance(Locale.GERMANY);
Collections.sort(values, collator);

gives me the following order:

Amman Jost
Ammanner Josef
Ammann Heinrich
Bare Werner
Barraud Maurice
Barr Burt
Bär Walter

The result above is actually not what I have expected, since spaces are not taken into account (looks like the case described here: Wikipedia Alphabetical order).

Does this mean, that Java Collation is not usable for such use case or am I doing something wrong here? Is there a way to make Java Collation space aware?

I would be glad for any comments or recommendations.

jhasenbe
  • 181
  • 1
  • 5

2 Answers2

12

You can customize the collation. Try looking at the source code to see how the Collator for German locale is built, as described in this answer.

Then adapt it to your needs. The tutorial gives a starting point. But no need to do all the work, someone else already has done it: see this blog post dealing with the exact same problem for Czech.

The essence of the solution linked above is:

String rules = ((RuleBasedCollator) Collator.getInstance(Locale.GERMANY)).getRules();
RuleBasedCollator correctedCollator 
    = new RuleBasedCollator(rules.replaceAll("<'\u005f'", "<' '<'\u005f'"));

This adds a rule for the space character just before the rule for underscore.

I confess I haven't tested this personally.

Community
  • 1
  • 1
Andrew Spencer
  • 15,164
  • 4
  • 29
  • 48
  • 1
    Thanks for your answer and useful links. Bigger issue is that for a web based application which is supposed to collate by a users locale, one would need to enhance potentially many locales. – jhasenbe May 15 '13 at 20:16
  • Then you should write it yourself: see if my proposition below can help – JonasVautherin May 16 '13 at 07:31
  • @jhasenbe Yes it's not satisfactory. You could probably hack something to perform the same change on any locale with similar rules, but it would be a hack – Andrew Spencer May 16 '13 at 21:07
-1

If you cannot modify the locale for some reasons, then I would propose that you write everything by yourself. Here are some ideas, though this code is not complete and does not work:

  • Instead of having a list of Strings, create your own objects, implementing comparable:

    public class myString implements Comparable<myString> {
        private String name;
    
        public myString(String name) {
           this.name = name;
        }
    }
    
  • Then you will need to implement (see an example here)

    public int compareTo(myString compareMyString) {
        ...
    }
    
  • Now comes the trickier part:

    • In order to compare your strings, you will need to split them (this will result in an array of Strings). For instance:

      // Original String
      "Barr Burt"
      
      // Splitted String
      [0]: "Barr"
      [1]: "Burt"
      
    • You will need to compare the words one after the other. Create a function doing something like this (This is a pseudo code: "this.words[i]" calls the i-th word of "this.name")

      public int compareWords(myString compareMyString, int i)
      {
          if (this.words[i] < compareMyString.words[i])
              return -1; // "this" should come before "compareMyString"
      
          if (this.words[i] > compareMyString.words[i])
              return 1; // "this" should come after "compareMyString"
      
          if (this.words[i] == compareMyString.words[i])
              return compareWords(i+1);
      }
      
    • And then compareTo:

      public int compareTo(myString compareMyString) {
          return compareWords(compareMyString, 0);
      }
      
JonasVautherin
  • 7,297
  • 6
  • 49
  • 95