18

Sorting a string with number is done differently from one language to another. For example, in English digits come before letters in an ascending sorting. But, in German, digits are ascendant sorted after letters.

I tried to sort strings using a Collator as follows:

private Collator collator = Collator.getInstance(Locale.GERMANY);
collator.compare(str1, str2)

But above comparison does not take into account digits after letters rule.

Does any one have an idea why Java is not taking this rule (digits after letter) into account for the time being I am using RuleBasedCollator as follows:

private final String sortOrder = "< a, A < b, B < c, C < d, D < e, E < f, F < g, G < h, H < i, I < j, J < k, K < l, L < m, M < n, N < o, O < p, P < q, Q < r, R < s, S < t, T < u, U < v, V < w, W < x, X < y, Y < z, Z < 0 < 1 < 2 < 3 < 4 < 5 < 6 < 7 < 8 < 9";

private Collator collator = new RuleBasedCollator(sortOrder);
bluish
  • 26,356
  • 27
  • 122
  • 180
Amir
  • 201
  • 2
  • 5
  • 6
    Is it deliberate that you don't have Umlauts and the Sharp-s (äöüß) in your sort order? I'd say they are important for having a German collator. – Joachim Sauer Oct 08 '12 at 09:30
  • yes, for the test case i have omitted umlauts and special characters. just wanted to keep it very simple. – Amir Oct 08 '12 at 09:33
  • 4
    Also: which rules do you follow that sort digits after the other characters? There are several different collations for German and at least some of those sort numbers first. – Joachim Sauer Oct 08 '12 at 09:35
  • i have just tried Locale.GERMANY collation, can you point me to a collation which sorts digits after alphabets? – Amir Oct 08 '12 at 09:38
  • If you are using Java 7, you can set a variant on your `Locale` which can be a BCP 47 extension (cf. http://docs.oracle.com/javase/tutorial/i18n/locale/create.html, and for BCP 47 http://docs.oracle.com/javase/tutorial/i18n/locale/extensions.html). AFAIK, there's a reorder setting for collation, but I've never actually worked with this. – s.d Oct 10 '12 at 10:52
  • 4
    What is your source for "*But, in German, digits are ascendant sorted after letters."*? – assylias Nov 27 '12 at 11:43

2 Answers2

15

You can check/debug the source code to see why nothing changes:

Collator.getInstance(Locale.GERMANY);

Calls the following piece code:

public static synchronized
Collator getInstance(Locale desiredLocale)
{
    // Snipping some code here
    String colString = "";
    try {
        ResourceBundle resource = LocaleData.getCollationData(desiredLocale);

        colString = resource.getString("Rule");
    } catch (MissingResourceException e) {
        // Use default values
    }
    try
    {
        result = new RuleBasedCollator( CollationRules.DEFAULTRULES +
                                        colString,
                                        CANONICAL_DECOMPOSITION );
    }
// Snipping some more code here

Over here you can see that the specific rules (colString which is empty in your case anyway) are placed after the defaults (CollationRules.DEFAULTRULES).

And as you have discovered that defaults have the numerics placed first:

  // NUMERICS

    + "<0<1<2<3<4<5<6<7<8<9"
    + "<\u00bc<\u00bd<\u00be"   // 1/4,1/2,3/4 fractions

    // NON-IGNORABLES
    + "<a,A"
    + "<b,B"
    + "<c,C"
    + "<d,D"
Jasper
  • 2,166
  • 4
  • 30
  • 50
0

I make my special sort with the following code:

import java.util.Arrays;
import java.util.Comparator;

public class SpecialSort implements Comparator<String> {

    private static final char[] knownChars = {'A', 'Ä', 'a', 'ä', 'á', 'à', 'â', 'å', 'ã', //
            'B', 'b', 'в',//
            'C', 'Ç', 'c', 'ç',//
            'D', 'd',//
            'E', 'É', 'e', 'ë', 'é', 'è', 'ê',//
            'F', 'f',//
            'G', 'g',//
            'H', 'h',//
            'I', 'i', 'ï', 'î', 'í', 'ì',//
            'J', 'j',//
            'K', 'k',//
            'L', 'l',//
            'M', 'm',//
            'N', 'n', 'ñ',//
            'O', 'Ö', 'Ó', 'Ò', 'o', 'о', 'ö', 'ó', 'ò', 'ô',//
            'P', 'p', 'р', 'π',//
            'Q', 'q',//
            'R', 'r',//
            'S', 's', 'ß', 'β',//
            'T', 't', 'т',//
            'U', 'Ü', 'u', 'ü', 'ú', 'ù', 'û',//
            'V', 'v',//
            'W', 'w',//
            'X', 'x',//
            'Y', 'y', 'ÿ',//
            'Z', 'z',//
            '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',//
            ' ', ',', '+', '-', '–', '*', '_', '#', '´',//
            '½', '¼', '@', '¹', '²', '>', '<', '’', '“', '„', '³',//
            '\'', '`', '"', '§', '○',//
            '?', '!', '\t', ((char) 10), '♕', '/', '\\',//
            '.', '·', ':', ';', '=', '&', '¶',//
            '(', ')', '[', ']',//
            ' ', '%', '»', '«', '®', '€', '£', 'ø',//
            '°', 'и', 'щ', '瘐', 'ɸ'//

    };

    private static final int[] mapping = new int[0x10000];
    public static boolean simpleSort = true;// if false compare deliver the special sort else normal sort using string.compareTo(otherString)
    public static boolean firstLetterIgnoreUpperCase = false;// if true first Letter uppercase will ignored.
    private static int lastGoodChar;

    static {
        Arrays.fill(mapping, Integer.MAX_VALUE);
        for (int i = 0; i < knownChars.length; i++) {
            if (knownChars[i] == ' ') {
                lastGoodChar = i;
            }
            mapping[knownChars[i]] = i;
        }
    }

    public static int staticCompare(@NonNull String one, @NonNull String two) {
        if (simpleSort)
            return one.compareTo(two);
        char[] chars1 = one.toCharArray();
        char[] chars2 = two.toCharArray();
        if (firstLetterIgnoreUpperCase) {
            if (chars1.length > 0)
                chars1[0] = ("" + chars1[0]).toLowerCase().charAt(0);
            if (chars2.length > 0)
                chars2[0] = ("" + chars2[0]).toLowerCase().charAt(0);
        }
        int[] ref1 = new int[1];
        int[] ref2 = new int[1];
        do {
            int c1 = getCharValue(chars1, ref1);
            int c2 = getCharValue(chars2, ref2);
            if (c1 != c2)
                return c1 - c2;
        } while ((ref1[0] < chars1.length) && (ref2[0] < chars2.length));
        if (ref1[0] < chars1.length)
            return Integer.MAX_VALUE;
        return Integer.MIN_VALUE;
    }

    private static int getCharValue(@NonNull char[] all, @NonNull int[] index) {
        if (index[0] == all.length) return 0;
        int ord = mapping[all[index[0]++]];
        if (ord < lastGoodChar)
            return ord;
        return getCharValue(all, index);
    }

    public int compare(@NonNull String one, @NonNull String two) {
        return staticCompare(one, two);
    }
}
Procrastinator
  • 2,526
  • 30
  • 27
  • 36
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – L_Cleo Jun 17 '23 at 10:53