3

Is there any Collator implementation which has the same characteristics as MySQL's utf8_general_ci? I need a collator which is case insensitive and does not distinguish german umlauts like ä with the vowel a.

Background: We recently encountered a bug which was caused by a wrong collation in our table. The used collation was utf8_general_ci where utf8_bin would be the correct one. The particular column had a unique index. The utf8_general_ci collation does not distinguish between words like pöker and poker, so the rows were merged, which was not desired. I now need a way to implement a module for our Java application, which repairs the wrong rows.

Benjamin
  • 544
  • 5
  • 21

1 Answers1

3

You could use the following collator:

Collator collator = Collator.getInstance();
collator.setStrength(Collator.PRIMARY);

A collator with this strength will only consider primary differences significant during comparison.

Consider an example:

System.out.println(compare("abc", "ÀBC", Collator.PRIMARY)); //base char
System.out.println(compare("abc", "ÀBC", Collator.SECONDARY)); //base char + accent
System.out.println(compare("abc", "ÀBC", Collator.TERTIARY)); //base char + accent + case
System.out.println(compare("abc", "ÀBC", Collator.IDENTICAL)); //base char + accent + case + bits

private static int compare(String first, String second, int strength) {
   Collator collator = Collator.getInstance();
   collator.setStrength(strength);
   return collator.compare(first, second);
}

The output is:

0
-1
-1
-1

Have a look at these links for more information:

http://www.javapractices.com/topic/TopicAction.do?Id=207 https://docs.oracle.com/javase/7/docs/api/java/text/Collator.html#PRIMARY

Ilya Patrikeev
  • 352
  • 3
  • 10
  • 1
    Note that by using `Collator.getInstance();` you are leaving it to circumstances what collator you actually get... I recommend choosing and explicitly specifying a `Locale`... The question then becomes... what locale? As it stands this code will pick a French or German locale if the computer it's running on is set to those settings... Might be fine, or might require your user to change their Windows settings just to get the correct result in your program... – Stijn de Witt Jun 14 '16 at 19:53
  • 1
    Also see this blog post: [Using MySQL Collations in Java](http://techblog.molindo.at/2009/10/using-mysql-collations-in-java.html) – Stijn de Witt Jun 14 '16 at 19:58
  • 1
    Also see this SO question: http://stackoverflow.com/questions/33999947/java-sorting-is-not-the-same-with-mysql-sorting – Stijn de Witt Jun 14 '16 at 20:08