3

I have a server hosting MySQL, PHPMyAdmin reports:

Server version: 5.1.56-community
MySQL charset: UTF-8 Unicode (utf8)

I export a sql from using either mysqldump -uroot -p database > file.dump or mysqldump -uroot -p database -r file.dump (both generated files are identical anyway).

Locally, I installed MySQL 5.5 and HeidiSQL 9.5.

As the server's SQL file my.ini has:

default-character-set=utf8

I changed the local my.ini file to have

default-character-set=utf8

But also:

character-set-server=utf8

They were both set to latin1. Dunno why I have character-set-server set here while the server does not. Anyway.

Now I start HeidiSQL, it shows utf8mb4 references instead of utf8 for the sessions parameters. I don't know why:

enter image description here

Now, I import my dumped file, and I see that even if everything is apparently configured in utf8, it looks like I have some encoding problems.

On the server, I see: enter image description here

Locally, in HeidiSQL, I see: enter image description here

Special characters like à are not displayed correctly on the local database.

Am I doing something wrong?

Note that if I install HeidiSQL on the server, the variable tab shows the same values for the Session and Global parameters, and the à is shown correctly.

So this may be the root cause of the problem, but I don't know how to fix it. If I change the Session values before importing the sql file it does not fix the issue, and also values are back to utf8mb4 when I start HeidiSQL again.

jpo38
  • 20,821
  • 10
  • 70
  • 151
  • Why are you running a 7 year old version of mysql? – Evert Sep 21 '18 at 12:43
  • @Evert: Because it has not been updated for the last 7 years ;-) That's actually why I'm working on this today, trying to migrate this to a new system. – jpo38 Sep 21 '18 at 12:51
  • Is the dump UTF-8 encoded? When you open it in a text editor explicitly as UTF-8, do the characters look okay…? – deceze Sep 21 '18 at 13:13
  • @deceze: I opened the file with Notepad++ which let's me select the encoding. `à` character looks right when UTF-8 is selected (default). So looks like the file is UTF-8 encoded. – jpo38 Sep 21 '18 at 13:28

3 Answers3

6

Thanks to deceze comment, I could fix the issue.

In HeidiSQL, when I choose the sql file to execute, there's actually an "ncoding" option I did not notice originally ;-)

If I keep "auto-detect", the import generates bad content (with mojibake characters)

If I force "UTF-8", the import is perfect

Dunno why HeidiSQL fails to auto-detect the encoding...

jpo38
  • 20,821
  • 10
  • 70
  • 151
2

A few thoughts:

It looks like you have the character set set correctly. The fact that HeidiSQL displays a different character set, is probably because clients themselves set a character set.

For example, your mysql server might use "Character set A" by default. If a client connects and says they want "Character set B", the server will convert this on the fly.

utf8mb4 is a superset (and superior to) utf8. It's better to have your server default to utf8mb4. The popular usecase of utf8mb4 is emoji.

Anyway, the reason you are getting mojibake is probably unrelated to having these character sets set correctly.

What I think may have happened is as follows (this is a guess).

  1. Your tables/columns were set as UTF-8.
  2. A client connects and tells the server "I want to use ISO-8559-1/latin instead".
  3. The server happily complies and will convert the clients ISO-8559-1 strings to UTF-8 on the fly.
  4. Despite the client wanting to use ISO-8559-1, it actually sends UTF-8.
  5. The server thinks the data is ISO-8559-1 and treats it as such, and converts the UTF-8 using a ISO-8559-1 to UTF. It's effectively a double-encoding.

If I'm right, it means that you can have all your columns, connections and tables set to UTF-8, but your data is simply bad.

If this is correct, this process is reversable

You really just need the opposite operation. For example, if you had a PHP string $data, which is 'double-encoded' as UTF-8, the process would simply be to call this:

$output = utf8_decode($input)

It's also possible to fix this in MySQL. See this stack overflow question.

A few things to be aware of:

  1. Make sure this is actually the case. Are you getting the correct output after this operation?
  2. Make backups, obviously.
  3. Also make absolutely sure that whatever was writing double-encoded UTF-8 to your database is now fixed. The last thing you want is a table that's a mixture of different encodings.

Sidenote: This problem is extremely common. You are somewhat lucky that you're french because it highlights the problem. Many english systems I've seen have this issue but it largely goes unnoticed for a long time because a lot of text doesn't go outside the common ASCII range.

Evert
  • 93,428
  • 18
  • 118
  • 189
  • Thankx for your help. But what does that mean? That the content of the database on the server is bad and I can't do anything to fix it? They why does phpMyAdmin and HeidiSQL installed on the server both display the database with valid characters? My feeling is that the remote server content and setup is OK, but the way I export data and later import it to my locally setup MySQL server is bad. – jpo38 Sep 21 '18 at 13:04
  • @jpo38 no, it's not corrupted. The process is reversible. I added more info my answer – Evert Sep 21 '18 at 13:16
  • Ran the command from the other SO question. Got warnings saying 'Warning: invalid utf8 character string: 'E0206D''....then the strings are cut when I used to have a mojibake...so it apparently does not fix the issue. – jpo38 Sep 21 '18 at 13:34
  • See my answer, apparently it was just an issue with encoding detection upon import. – jpo38 Sep 21 '18 at 13:41
0

You have "Mojibake". à turns into à (there are two characters, the second is a space).

This is caused when latin1 is involved somewhere in the process. The SESSION and GLOBAL settings are not at fault. Let's see SHOW CREATE TABLE.

See Mojibake in Trouble with UTF-8 characters; what I see is not what I stored for the likely causes. It may involve "Double Encoding"; let's see SELECT col, HEX(col) ....

As for fixing the data -- It depends on whether you have simply Mojibake or Double Encoding. See http://mysql.rjweb.org/doc.php/charcoll#fixes_for_various_cases for both.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Thanks for your help. As said in my own answer it was just a problem with encoding when executing the SQL file to import the database into MySQL. – jpo38 Sep 30 '18 at 16:33
  • @jpo38 - I just added to my answer to address the two techniques for "fixing" the data. – Rick James Sep 30 '18 at 16:38