1

Am migrating a schema from mysql 5.5.62 to mysql 8.0.27.

The way am doing is getting a mysqldump from 5.5.62 of the required schema and then importing the dump into mysql 8.0.27.

Everything works normal with the imported data however while doing an insert from a java application in one of the table fails with the following error: java.sql.SQLException: Incorrect string value: '\xE2\x80\x8B' for column

The character set in mysql8.0.27 db is as below

mysql> show variables like 'char%';
+--------------------------+--------------------------------+
| Variable_name            | Value                          |
+--------------------------+--------------------------------+
| character_set_client     | utf8mb4                        |
| character_set_connection | utf8mb4                        |
| character_set_database   | utf8mb4                        |
| character_set_filesystem | binary                         |
| character_set_results    | utf8mb4                        |
| character_set_server     | utf8mb4                        |
| character_set_system     | utf8mb3                        |
| character_sets_dir       | /usr/share/mysql-8.0/charsets/ |
+--------------------------+--------------------------------+
8 rows in set (0.00 sec)

The character set in mysql 5.5.62 db is as below

mysql> show variables like 'char%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | latin1                     |
| character_set_connection | latin1                     |
| character_set_database   | latin1                     |
| character_set_filesystem | binary                     |
| character_set_results    | latin1                     |
| character_set_server     | latin1                     |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.01 sec)

the jdbc connector used in the java application which is performing the insert via hibernate is mysql-connector-java-8.0.27.jar

Why should a user needs to play with character_sets explicitly(when migrating the data to a higher version of mysql) when the user has not done anything special while creating the schema/table in the mysql 5.5.62 server? Any thoughts on what is the best way to deal with issues like these?

Edit on 22-Nov-2021

After several days battling with this finally i came to know that the 'culprit' was a specific text returned by the Microsoft Azure API having a character '\u200b' at the end. I could notice this only when i inspected the RAW text of the response returned by the Azure API in PostMan...

The description of this '\u200b' (Zero Width Space) is clearly explained here https://www.fileformat.info/info/unicode/char/200B/index.htm

Even the UTF-8 (hex) encoding of this character is '0xE2 0x80 0x8B (e2808b)'. which is exactly what is the present in the SQL exception java.sql.SQLException: Incorrect string value: '\xE2\x80\x8B' for column

So for me i dont think that i need to handle this by tweaking the database character set configurations as can simply remove this character from the response and process the text before inserting to the database.

However i would like to keep this post active as too much of my effort was wasted troubleshooting the issue thinking this was something with the charset configuration of the DB.

Having said all this i tried to execute an insert query from MySQLWorkBench with this character into the table in mysql 5.5 and it inserted the data but with a warning 1 row(s) affected, 1 warning(s): 1366 Incorrect string value: '\xC2\x80' for column
Please note there is only a warning by the db and not an error

When i tried to do the same insert into the mysql 8.0 (via MySQLWorkBench) an error was returned by the DB Error Code: 1366. Incorrect string value: '\xC2\x80' for column.

In a real scenario if somebody needs to persist this as is in the table column then how is he/she supposed to do this? Any thoughts?

active_coder
  • 79
  • 1
  • 2
  • 13

1 Answers1

0

Hex E2808B is the "ZERO WIDTH SPACE" in UTF-encoding.

Please provide SHOW CREATE TABLE for the table in question and indicate which column was being stored into.

Normally, I would point you at Trouble with UTF-8 characters; what I see is not what I stored for further diagnosis and http://mysql.rjweb.org/doc.php/charcoll#fixes_for_various_cases for how to fix whichever case you have.

Can you show us some more context? That is what characters are around those 3 hex characters.

I can't quite tell if this is a "double-encoding" case or UTF-8 text incorrectly declared to be latin1 when it was INSERTed. If you can find how the string was inserted, that might help.

C280 smells like an incorrectly encoded Eurosign.

Rick James
  • 135,179
  • 13
  • 127
  • 222