7

I need to be able to store characters like \xF0\x9F\x94\xA5 in my database, which, according to this post need UTF8mb4 encoding.

So I set up my database with

CREATE DATABASE `myDB` CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci

and verified in the MySQL shell if it was effective:

SHOW FULL COLUMNS FROM myTable;

+---------+------------------+--------------------+----
| Field   | Type             | Collation          | ...
+---------+------------------+--------------------+-----
| id      | int(10) unsigned | NULL               | ...   
| myColumn| text             | utf8mb4_general_ci | ...
+---------+------------------+--------------------+-----

So far so good.

After running my program, I got this Exception:

Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.3.2.v20111125-r10461): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: java.sql.SQLException: Illegal mix of collations (utf8mb4_general_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation 'like'
Error Code: 1267

For the record: I am using the Java Persistency API (JPA) in my webapplication with GlassFish 3.1. The Exception gets thrown when executing a named query:

@NamedQuery(name = "myTable.findByMyColumn", query = "SELECT c FROM myTable c WHERE c.myColumn LIKE :myColumn")

However, it appears that the error only occurs when the queried String actually contains those weired emoji characters (\xF0\x9F\x94\xA5)

Call: SELECT id, myColumn FROM myDB.myTable WHERE myColumn LIKE ?
bind => [Something something Lorem Ipsum ]  

So I thought, that somewhere might still be a utf8_general_ci set up and I tried putting the COLLATION directly into the query (as suggested in this post here)

@NamedQuery(name = "myTable.findByMyColumn", query = "SELECT c COLLATE utf8mb4_general_ci FROM myTable c WHERE c.myColumn LIKE :myColumn")

But still nothing.

Then I tried putting the collation directly in the connection (in GlassFish I am using a connection_pool) as I read here

characterEncoding, UTF8mb4

But GlassFish said only Connection could not be allocated because: Unsupported character encoding 'UTF8mb4'

The last thing I did was checking the Database System (I am using MariaDB)

show variables WHERE variable_name like "col%";
+----------------------+------------------+
| Variable_name        | Value            |
+----------------------+------------------+
| collation_connection | utf8_general_ci  |
| collation_database   | utf32_general_ci |
| collation_server     | utf8_general_ci  |
+----------------------+------------------+

And now I am completely lost...

What can I do to use utf8mb4 or utf-32 or anything else for that matter that is more advanced that simple UTF-8 ?

Community
  • 1
  • 1
GameDroids
  • 5,584
  • 6
  • 40
  • 59
  • 2
    connection pool character encoding fails because in java, utf-8mb4 is just UTF-8. I think it is the connection which is not set to the right collation. Why don't you try to execute a `SET NAMES 'utf8mb4'` query during initialization of the context (or just before your `where .. like` query for testing purposes)? – guido Oct 09 '14 at 17:12
  • Unbelievable, that little trick of yours, did the trick! The database is accepting the weired characters without complaining :) This may not be the most efficient way, but now I could just execute this: `entityManager.createNativeQuery("SET NAMES 'utf8mb4'");` whenever I expect text that needs more than utf8. Maybe you can post this an answer, so I can accept it – unless you have more good ideas – GameDroids Oct 09 '14 at 17:50
  • you can also set that in `my.cnf` if you prefer, but will apply to all connections (dunno if with any drawbacks) – guido Oct 09 '14 at 17:55
  • also, can you update your jdbc client to a newer version? – guido Oct 09 '14 at 17:57
  • This is meant to run on a remote server which I can't update myself. So all I have is a MariaDB 5.5.36 server and GlassFish 3.1.2.2. However, I won't be using the database for anything else so I can just edit `my.cnf` ! – GameDroids Oct 09 '14 at 18:06

1 Answers1

12

You do not need any change in the Java side, as utf8mb4 is just UTF-8 in Java.

Instead, as you can see here:

show variables WHERE variable_name like "col%";
+----------------------+------------------+
| Variable_name        | Value            |
+----------------------+------------------+
| collation_connection | utf8_general_ci  |
| collation_database   | utf32_general_ci |
| collation_server     | utf8_general_ci  |
+----------------------+------------------+

your connection setting is still utf8_general_ci; to set it at the connection level, one option is to execute the (mysql specific) query:

SET NAMES='utf8mb4'

before any attempt to use the utf8mb4 collation; or, generally for the mysql server, in /etc/my.cnf:

[mysql]
default-character-set = utf8mb4

[mysqld]
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci

Another option without changing the connection string, is using a jdbc driver version >= 5.1.13: http://www.opensubscriber.com/message/java@lists.mysql.com/14151747.html

guido
  • 18,864
  • 6
  • 70
  • 95
  • Thank you. Interesting link - I use `mysql-connector-java-5.1.32` so I should be fine when changing the `my.conf` – GameDroids Oct 09 '14 at 18:39