0

I'm having a problem with the encoding when dumping a database using mysqldump.

The issue is that the file being generated is breaking non-ASCII characters (for ex. german and spanish characters). The data in the DB is right, but it is exported wrong.

I have tried the following:

  1. using --default-character-set to utf8, utf8mb4, and latin1 (the last option because although the tables are using utf8_general_ci collation, the database itself is set to latin1, I don't know why). Weirdly enough, the output differs in filesize, but the content (specially the problematic characters) shows the same issue in all three cases. As if the option would be ignored.
  2. importing the dumped file into a new mysql service, but since the characters are broken in the file, the import is also broken. for ex. the dump with the utf8mb4 option is imported in a fresh database with character encoding utf8mb4, but since the source file is wrongly encoded, it is not being "transcoded back" to a right form.
  3. Initially I thought that it could be an issue with the version of the mysql server being different (5.7 in the source, 8.0 in the destination server), but since the file seems to be already broken, I now think that this might not be the root-cause. Still lost, so I prefer to mention it just in case it helps.

An example of the sentence I'm running:

mysqldump --default-character-set=utf8mb4 --no-tablespaces -u database_user -p database_name > /home/username/database_name-utf8mb4-20220712.sql

No errors appear neither during the export nor during the import in the new server. Everything seems to run smooth, but the character encoding is messed up, so something isn't OK.

Any support is much appreciated. Thank you!

versvs
  • 643
  • 2
  • 12
  • 30

1 Answers1

1

but the character encoding is messed up

Give us an example. Include a hex dump of a small portion of the file where garbage shows up.

It is likely that the original data was either in character set utf8 or latin1, but the dumping and/or reloading specified the wrong character set. Please provide more details of the dump and load.

Also see: Trouble with UTF-8 characters; what I see is not what I stored

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Thanks, Rick! I had just the basic dump produced by `mysqldump`. It smells like at some point the webapp sending data to the mysql engine was misconfigured and though the database had columns collated to utf8, the app was sending latin1, which was not noticed because it was also the same app decoding the data to show to the end user. A typical sentence with errors is `'Más allá'` (it is supposed to say `Más allá` in spanish with tildes. – versvs Jul 14 '22 at 20:11
  • 1
    That is "Mojibake"; see my link. – Rick James Jul 15 '22 at 03:41
  • Yes! There it is, hehe. I will give it a try to your proposed solution during the weekend :) – versvs Jul 16 '22 at 08:47