I'm trying to figure out something which seems odd to me.
A little background first.
We are using MySqlConnector version 8.0.23 and we noticed that the character encoding used is WINDOWS-1252
Our database is defined as the character set latin1
.
We saw that MySQL server saves the data as a byteArray, also the MySQL connector function that reads string from MySQL server is as follows:
public String createFromBytes(byte[] bytes, int offset, int length, Field f) {
return StringUtils.toString(bytes, offset, length,
f.getCollationIndex() == CharsetMapping.MYSQL_COLLATION_INDEX_binary ? this.pset.getStringProperty(PropertyKey.characterEncoding).getValue()
: f.getEncoding());
}
Basically, read a byte array with some encoding and return a string for it. (here we saw it used WINDOWS-1252 encoding).
So we tried to think, how does the character_set defined in the MySQL server is relevant here?
If the connector is the one specifying the encoding and decoding and the server only holds byteArray it seems like the MySQL server configuration is not used.
So we thought maybe it's used in statements performed by MySQL itself (order by , join ... etc)
And we thought that was the case, however we saw something else that now doesn't make sense.
We had another issue where we got an emoji and got the following error message on insert.
java.sql.SQLException: Incorrect string value: '\xF0\x9F\x98\x89",...' for column...
MySQL Connector was on version 8.0.27 (there wasn't any character encoding defined so it should use UTF8 as far as I know)
and the column was defined as utf8
(alias to utf8mb3)
In addition, when we downgraded MySql connector to 8.0.23 (which used Windows-1252 character set)
we didn't get an error from the server - the emoji was saved as '?'.
what is going on here?
I saw this post How to store Emoji Character in MySQL Database
which says to define the connection and server to utf8mb4.
I would expect it also to work when configuring only the connection to utf8mb4.
As the emoji will be encoded properly to the byteArray The byteArray will be saved in the column The byteArray will be read as utf8MB4
Thank you