Almost 8 years later, my own recommendation based on some hard learnings:
Use no collation at all (default for H2 databases).
Rationale: Using a collation will produce some really unexpected results and bugs.
Pitfall: UNIQUE constraints
By far the most common unique constraints i saw in daily business was to enforce unique (firstname, lastname). Typically, case should be ignored (prevent both 'thomas müller' and 'Thomas Müller'), but not umlauts (allow both 'Thomas Müller' and 'Thomas Muller').
It might be tempting to use a collation strength SECONDARY
setting to achieve this (case-insensitive but umlaut-sensitive). Don't. Use VARCHAR_IGNORECASE
columns instead.
{
// NOT recommended: using SECONDARY collation
Statement s = DriverManager.getConnection("jdbc:h2:mem:", "test", "test").createStatement();
s.execute("SET COLLATION ENGLISH STRENGTH SECONDARY");
s.execute("CREATE TABLE test ( name VARCHAR )");
s.execute("ALTER TABLE test ADD CONSTRAINT unique_name UNIQUE(name)");
s.execute("INSERT INTO test (name) VALUES ('Müller')");
s.execute("INSERT INTO test (name) VALUES ('Muller')");
// s.execute("INSERT INTO test (name) VALUES ('muller')" /* will fail */);
}
{
// recommended: no collation, using VARCHAR_IGNORECASE instead of VARCHAR column
Statement s = DriverManager.getConnection("jdbc:h2:mem:", "test", "test").createStatement();
s.execute("CREATE TABLE test ( name VARCHAR_IGNORECASE )");
s.execute("ALTER TABLE test ADD CONSTRAINT unique_name UNIQUE(name)");
s.execute("INSERT INTO test (name) VALUES ('Müller')");
s.execute("INSERT INTO test (name) VALUES ('Muller')");
// s.execute("INSERT INTO test (name) VALUES ('muller')" /* will fail */);
}
Pitfall: Searching / WHERE clauses
Recommendation: The default behavior without collation is just fine, and behaves as expected. For more fuzzy searching use your own code search or a library like Lucene.
SECONDARY
collation strength will match even if case is different. You will not expect that behavior when using SELECT WHERE name = '...'
, because you will forget all about your collation setting.
{
Statement s = DriverManager.getConnection("jdbc:h2:mem:", "test", "test").createStatement();
s.execute("SET COLLATION ENGLISH STRENGTH SECONDARY");
s.execute("CREATE TABLE test ( name VARCHAR )");
s.execute("INSERT INTO test (name) VALUES ('Thomas Müller')");
ResultSet rs = s.executeQuery("SELECT count(*) FROM test WHERE name = 'Thomas müller'" /* different case */);
rs.next();
/* prints 1 (!) */ System.out.println(rs.getLong(1));
}
PRIMARY
collation strength will match even if SPACES are different. Would you believe the English primary collation ignores spaces? Check out this nugget: https://stackoverflow.com/a/16567963/1124509
{
Statement s = DriverManager.getConnection("jdbc:h2:mem:", "test", "test").createStatement();
s.execute("SET COLLATION ENGLISH STRENGTH PRIMARY");
s.execute("CREATE TABLE test ( name VARCHAR )");
s.execute("INSERT INTO test (name) VALUES ('Thomas Müller')");
ResultSet rs = s.executeQuery("SELECT count(*) FROM test WHERE name = 'ThomasMüller'" /* no space! */);
rs.next();
/* prints 1 (!) */ System.out.println(rs.getLong(1));
}
Sorting / ORDER BY clauses
The default ordering without collation is not really useful in real-world scenarios, as it will sort according to strict string comparison. Solve this by first loading the data from the database, and then order / sort it with code.
Personally, I mostly use an English primary strength collator with the spaces problem fixed. Works fine even for non-English text columns.
But you might also need to use a custom comparator to satisfy more difficult requirements like natural or intuitive sort orders, e.g. sort like windows explorer, or semantic versioning.
{
Statement s = DriverManager.getConnection("jdbc:h2:mem:", "test", "test").createStatement();
s.execute("CREATE TABLE test ( name VARCHAR )");
s.execute("INSERT INTO test (name) VALUES ('é6')");
s.execute("INSERT INTO test (name) VALUES ('e5')");
s.execute("INSERT INTO test (name) VALUES ('E4')");
s.execute("INSERT INTO test (name) VALUES ('ä3')");
s.execute("INSERT INTO test (name) VALUES ('a2')");
s.execute("INSERT INTO test (name) VALUES ('A1')");
ResultSet rs = s.executeQuery("SELECT name FROM test ORDER BY name");
List<String> names = new ArrayList<>();
while(rs.next()) {
names.add(rs.getString(1));
}
// not very useful strict String.compareTo() result: [A1, E4, a2, e5, ä3, é6]
System.out.print(names);
String rules = ((RuleBasedCollator) Collator.getInstance(new Locale("en", "US"))).getRules();
Collator collator = new RuleBasedCollator(rules.replaceAll("<'\u005f'", "<' '<'\u005f'"));
collator.setStrength(Collator.PRIMARY);
names.sort((a, b) -> collator.compare(a, b));
// as humans usually expect it in a name list / table: [A1, a2, ä3, E4, e5, é6]
System.out.print(names);
}
How to check if your H2 database is using a collation?
Look at the SETTINGS table. If no collation is set, there will be no entry in the table.
