3

Regarding MySQL database and Java JDBC connection encoding. Database has been converted to utf8mb4 and utf8mb4_unicode_ci as shown this is the results of SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%'; from the JDBC connection.

+--------------------------+--------------------+
|      Variable_name       |       Value        |
+--------------------------+--------------------+
| character_set_client     | utf8mb4            |
| character_set_connection | utf8mb4            |
| character_set_database   | utf8mb4            |
| character_set_filesystem | binary             |
| character_set_results    | utf8mb4            |
| character_set_server     | utf8mb4            |
| character_set_system     | utf8               |
| collation_connection     | utf8mb4_unicode_ci |
| collation_database       | utf8mb4_unicode_ci |
| collation_server         | utf8mb4_unicode_ci |
+--------------------------+--------------------+

From the MySQL Workbench and the terminal connecting directly to the DB I can see the Unicode character í with its correct hex value of c3 ad

+------------------------------+
| HEX(location.name)           |
+------------------------------+
| C3AD                         |
+------------------------------+

JDBC Connection settings: useUnicode=true&characterEncoding=UTF-8

Using HikariCP with the config of

config.addDataSourceProperty("useUnicode", "true"); config.addDataSourceProperty("characterEncoding", "utf-8"); config.setConnectionInitSql("SET NAMES utf8mb4 COLLATE utf8mb4_unicode_ci");

Using mysql-connector-java:8.0.11

From the results of querying the table in question using the JDBC connection the í character is returned as within Postman. And the is returned as ? within Postman.

Which leads me to believe that my connection during reading is not UTF-8 according to stackoverflow.com/questions/38363566 how would I detect this?

Database and application have been reset to apply settings, if it was needed.

Erwin Bolwidt
  • 30,799
  • 15
  • 56
  • 79

3 Answers3

2

characterEncoding=utf-8 is incompatible with utf8mb4. Use character_set_server=utf8mb4 in your JDBC URL instead, or config.addDataSourceProperty("character_set_server", "utf8mb4");. Do not use characterEncoding at all.

From the MySQL Connection/J developer guide → Using Character Sets → Setting the Character Encoding:

… to use the 4-byte UTF-8 character set with Connector/J, configure the MySQL server with character_set_server=utf8mb4, and leave characterEncoding out of the Connector/J connection string.

And right below that:

Warning

In order to use the utf8mb4 character set for the connection, the server MUST be configured with character_set_server=utf8mb4; if that is not the case, when UTF-8 is used for characterEncoding in the connection string, it will map to the MySQL character set name utf8, which is an alias for utf8mb3.

VGR
  • 40,506
  • 4
  • 48
  • 63
0

In addition to following the help from VGR I also was using a normal PrintWriter to send the responses which doesn't allow for UTF-8 encoding. Instead of

PrintWriter out = response.getWriter();
out.println(res);
out.flush();

Replaced with

response.getOutputStream().write(res.toString().getBytes("UTF-8"));
0

"í character is returned as " is a different issue than " is returned as ?"

The former usually happens when the byte(s) for í were not encoded as UTF-8. Note that, in MySQL, utf8mb3 and utf8mb4 are identical for the correct encoding of that character and all other European characters. Fixing the connection (as discussed by VGR) will probably not fix it. Black diamonds seem to occur only when the browser is not set to UTF-8 (Unicode).

The "pile of poo" is available only in utf8mb4, not utf8mb3. So, assuming the client correctly had hex F09F92A9, then the connection parameters (see VGR) were probably causing the problem.

(More discussion is in the link you provided.)

Rick James
  • 135,179
  • 13
  • 127
  • 222