Am migrating a schema from mysql 5.5.62 to mysql 8.0.27.
The way am doing is getting a mysqldump from 5.5.62 of the required schema and then importing the dump into mysql 8.0.27.
Everything works normal with the imported data however while doing an insert from a java application in one of the table fails with the following error: java.sql.SQLException: Incorrect string value: '\xE2\x80\x8B' for column
The character set in mysql8.0.27 db is as below
mysql> show variables like 'char%';
+--------------------------+--------------------------------+
| Variable_name | Value |
+--------------------------+--------------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8mb3 |
| character_sets_dir | /usr/share/mysql-8.0/charsets/ |
+--------------------------+--------------------------------+
8 rows in set (0.00 sec)
The character set in mysql 5.5.62 db is as below
mysql> show variables like 'char%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.01 sec)
the jdbc connector used in the java application which is performing the insert via hibernate is mysql-connector-java-8.0.27.jar
Why should a user needs to play with character_sets explicitly(when migrating the data to a higher version of mysql) when the user has not done anything special while creating the schema/table in the mysql 5.5.62 server? Any thoughts on what is the best way to deal with issues like these?
Edit on 22-Nov-2021
After several days battling with this finally i came to know that the 'culprit' was a specific text returned by the Microsoft Azure API having a character '\u200b' at the end. I could notice this only when i inspected the RAW text of the response returned by the Azure API in PostMan...
The description of this '\u200b' (Zero Width Space) is clearly explained here https://www.fileformat.info/info/unicode/char/200B/index.htm
Even the UTF-8 (hex) encoding of this character is '0xE2 0x80 0x8B (e2808b)'. which is exactly what is the present in the SQL exception java.sql.SQLException: Incorrect string value: '\xE2\x80\x8B' for column
So for me i dont think that i need to handle this by tweaking the database character set configurations as can simply remove this character from the response and process the text before inserting to the database.
However i would like to keep this post active as too much of my effort was wasted troubleshooting the issue thinking this was something with the charset configuration of the DB.
Having said all this i tried to execute an insert query from MySQLWorkBench with this character into the table in mysql 5.5 and it inserted the data but with a warning 1 row(s) affected, 1 warning(s): 1366 Incorrect string value: '\xC2\x80' for column
Please note there is only a warning by the db and not an error
When i tried to do the same insert into the mysql 8.0 (via MySQLWorkBench) an error was returned by the DB Error Code: 1366. Incorrect string value: '\xC2\x80' for column.
In a real scenario if somebody needs to persist this as is in the table column then how is he/she supposed to do this? Any thoughts?