1

I have a table defined as follows:

mysql> show create table temptest;
+------------+-----------------------------------------------------------------------------------------------------------+
| Table      | Create Table                                                                                                  |
+------------+-----------------------------------------------------------------------------------------------------------+
| temptest | CREATE TABLE `temptest` (
  `mystring` varchar(100) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1 |
+------------+-----------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

When I use mysql console (through mysql temptest) and insert a character through

insert into temptest values ("é");

I can see it is getting saved as "latin1" encoding

mysql> select hex(mystring) from temptest;
+---------------+
| hex(mystring) |
+---------------+
| E9            |
+---------------+

But if I issue a "set names latin1" and perform the same operation, I see it storing the same character in utf8 encoding.

mysql> set names latin1;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into temptest values ("é");
Query OK, 1 row affected (0.01 sec)

mysql> select hex(mystring) from temptest;
+---------------+
| hex(mystring) |
+---------------+
| E9            |
| C3A9          |
+---------------+

As far as I understand, "set names" shouldn't affect how mysql stores the data (https://dev.mysql.com/doc/refman/8.0/en/set-names.html). What am I missing here? Any insight into this would be greatly appreciated. Thank you.

Jay-C
  • 23
  • 1
  • 6
  • "What am I missing here?" é in latin1 or utf8 encoding is the same.. – Raymond Nijland Nov 15 '18 at 17:41
  • I don't think é in latin1 and utf8 are encoded to the same byte. – Jay-C Nov 15 '18 at 19:09
  • they are the latin1 chars in utf8 are backwards compatible – Raymond Nijland Nov 15 '18 at 19:12
  • I'm not saying utf8 is not backwards compatible to latin1 for é. What I am saying is the byte encoding is not the same for é in latin1 vs utf8. And you said é is the same encoding in utf8 and latin1, which is not true. And I don't see how this answers my question. – Jay-C Nov 15 '18 at 19:18
  • The small letter é is both in ANSI and in ISO 8859-15 0xE9 (233), and in Unicode accordingly U + 00E9 (233), and is thus stored in UTF-8 as 0xC3 0xA9 (195 169)... So what you get is correct.. – Raymond Nijland Nov 15 '18 at 19:28
  • Maybe my question wasn't clear. I am asking why does issuing "set names latin1" cause the character é to be encoded as utf8 whereas before issuing the set names command, it was being encoded as latin1. – Jay-C Nov 15 '18 at 20:08

1 Answers1

3
  1. SET NAMES latin1 declares that the encoding in your client is latin1.
  2. But (apparently) it is actually utf8.
  3. So, when you type é, the client generates the 2 bytes C3 A9.
  4. Then those are sent as if they were latin1 to the server (mysqld).
  5. The Server says "Oh, I am getting some latin1 bytes, and I will be putting them into a latin1 column, so I don't need to transform them.
  6. In go two latin1 characters é (hex C3A9). This is called Mojibake.
  7. If you do SET NAMES utf8 and SELECT the text, you will "see" é and it will be 4 bytes (hex C383C2A9)!

Bottom line: Your client encoding was really utf8, so you should have said SET NAMES utf8 (or utf8mb4). Confused? Welcome to the club.

Rick James
  • 135,179
  • 13
  • 127
  • 222