0

I'm having problem reading UTF-8 data from MySQL database by using MySQL Connector v. 8.0.19. Scandic letters, such as "äö" are replaced with unknown characters. I already made sure the database and its tables and columns are using utf8mb4. Then I added useUnicode=true&characterEncoding=UTF-8 to JDBC connection string, but the outcome is still unexpected. I'm running MySQL CE v. 8 in a Docker container. I can see the scandic letters fine when I run the SELECT queries in a command-line.

Mikael H.
  • 23
  • 8
  • 2
    Where are you printing the output? – areus Apr 14 '20 at 16:00
  • I’m printing the output from the getter right after the value is set from `ResultSet`. – Mikael H. Apr 14 '20 at 18:52
  • If you are printing to System.out, then ensure it is using UTF-8: `PrintStream out = new PrintStream(System.out, true, "UTF-8");` - and then: `out.println("读写汉字");`. Better yet, use `import static java.nio.charset.StandardCharsets.UTF_8` instead of "UTF-8". I'm sure this is answered in other SO questions but cannot find a good example right now. – andrewJames Apr 14 '20 at 19:00
  • I’m developing Rest API with Spring Boot and I saw this problem first time from JSON response (`ResponseEntity`). I did the printing with logger just to see where the problem is originating. Is this behavior normal from JDBC and what would be the best solution? – Mikael H. Apr 14 '20 at 19:42
  • Select HEX(col) so we can see what was stored. What do you mean by "unknown characters"? See https://stackoverflow.com/questions/38363566/trouble-with-utf8-characters-what-i-see-is-not-what-i-stored – Rick James Apr 14 '20 at 21:39
  • I just tried, what @andrewjames suggested, but my "äö" still gets replaced with "ä" characters. I printed the result directly from `ResultSet`. What comes to `SELECT HEX(col_name)`, here is one problematic HEX string: "596C6569736CC383C2A4C383C2A46BC383C2A47269". When I use online converter, even that cannot show the value correctly. Collation for this table seems to be `utf8mb4_unicode_ci` and the same applies to columns. – Mikael H. Apr 15 '20 at 08:43
  • And when I enter to MySQL command-line with parameter `--default-character-set=utf8mb4` I cannot see the values correctly anymore. I cannot even type scandic letters in MySQL command-line. I used SQL script to create the schema. The schema was written in Visual Studio Code and there seems to be UTF-8 enabled. – Mikael H. Apr 15 '20 at 08:57
  • Please [edit] your question, and make sure to also show what the actual value is supposed to be of that hex-value, that way it is possible to identify actually which character set is used to store your data. – Mark Rotteveel Apr 15 '20 at 14:50

1 Answers1

0

I solved this problem by passing --default-character-set=utf8mb4 to MySQL command-line before creating the schema from a separate file. I could add this option to MySQL server configuration as a default.

Mikael H.
  • 23
  • 8