I have discovered that the reason I am getting weirdness with my strings being written to MySQL (5.6) from Kotlin is that, occasionally, they are being double encoded. I've verified this by printing the hex() of the column of two identical strings-- here is an example where an API returned the identical string, with the identical jdbcurl parameters, identical everything, ten minutes apart:
Great job getting your protein and fruit to help balance your blood sugar
4772656174206A6F622067657474696E6720796F75722070726F7465696E20616E6420667275697420746F2068656C702062616C616E636520796F757220626C6F6F6420737567617220F09F8D93F09F8D87F09F8D8E
Great job getting your protein and fruit to help balance your blood sugar ðŸ“ðŸ‡ðŸŽ
4772656174206A6F622067657474696E6720796F75722070726F7465696E20616E6420667275697420746F2068656C702062616C616E636520796F757220626C6F6F6420737567617220C3B0C5B8C28DE2809CC3B0C5B8C28DE280A1C3B0C5B8C28DC5BD
For comparison, here's the end of each string where the encoding differs:
F09F8D93F09F8D87F09F8D8E
C3B0C5B8C28DE2809CC3B0C5B8C28DE280A1C3B0C5B8C28DC5BD
Is this double encoding? It looks like it's being encoded in utf-8 then decoded in cp1250 or latin1 then encoded again in utf-8. I mean, it seems like that's the case but I can't honestly figure out what else it could be.
I'm kind of at wits end here. I've read through all of @Rick James great info on this. I have my table columns properly set with utf8mb4
and my jdbcurl ends with:
&useUnicode=true&characterEncoding=UTF-8&sessionVariables=character_set_server=utf8mb4
The variables set on the MySQL server are:
character_set_client = utf8mb4
character_set_connection = utf8mb4
character_set_database = latin1
character_set_filesystem = binary
character_set_results = utf8mb4
character_set_server = latin1
character_set_system = utf8
I really have no idea why it works and then just... doesn't. And generally when it doesn't work, it's like some setting gets changed and sticks because after one starts getting mangled, they all start getting mangled after wards.
This is a database that has multiple connections to it at once from different clients and code bases-- so I'm wondering if there is something causing a double connection to double encode these, but I have no clue at this point. I've verified that the payload is handled properly in Kotlin but can't pinpoint where the double encoding is happening, and why.
Hopefully somebody has some info on this because it's driving me insane! I should mention our older PHP code base is just fine and has no issues reading/writing the same exact payload from the same exact API which makes me think it's something in the Kotlin/Java jdbc driver potentially.
My original post from this weekend is here: Issues writing utf8 strings with emojis from Kotlin to MySQL- utf8 vs utf8mb4
This lead me to check the byte structure.
EDIT:
I am adding this because it could explain the weird behavior I'm seeing. I am using the jdbc connector version 8 (8.0.25) from Kotlin, which according the documentation only officially supports MySQL 5.7 and 8.
https://dev.mysql.com/doc/connector-j/8.0/en/connector-j-versions.html
We are rocking MySql 5.6-- so I'm wondering if this could be causing my pain.
EDIT2:
SHOW CREATE TABLE output:
CREATE TABLE `tablename` (
`c1` int(11) unsigned NOT NULL AUTO_INCREMENT,
`c2` int(11) NOT NULL,
`c3` varchar(1000) NOT NULL DEFAULT '',
`c4` smallint(6) NOT NULL DEFAULT '3',
`c5` smallint(6) NOT NULL DEFAULT '0',
`created` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
`modified` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=492331 DEFAULT CHARSET=utf8mb4
c3 is the column in question.
I know latin1 is the character_set_server
and that the jdbc connection says for mysql 5.6 you have to set that to utf8mb4 or it will not work-- and I am setting that in the sessionVariables but is not only working sometimes? Does it need to be set on the server itself as the default with a restart of the database?
@RickJames the two dumps I have up above are from doing a HEX()
on two different rows of the same exact table, where I am attempting to write the exact same thing from Kotlin yet one works and the other seems to get double encoded.