0

I recently had to changed mysql from latin-1 to utf-8 to handle Russian characters. They were originally showing up as ?????.

I also had to change a couple of tables in my database to utf8mb4. I originally had these set to utf8 but this did not have enough bits to handle certain characters.

I have to make a change to a production database and want to ensure that i do not have any issues a few months down the line with a particular encoding type.

So my question is when do i use what encoding on a table?

Prabhat G
  • 2,974
  • 1
  • 22
  • 31
  • 1
    Opinion based : I generally use `utf8mb4` and `utf8mb4_unicode_520_ci`. These give improved collation. – Prabhat G Jun 28 '17 at 10:19
  • Only one character set applies to any data stored in mysql at any point of time, so the when question is moot. The answer to the what is simple: use the character set that contains all the characters you need to use within your application. – Shadow Jun 28 '17 at 10:28
  • i agree, but i had to ask. i think the concern was changing every table in the db to this encoding, i.e. issues with existing data and output. –  Jun 28 '17 at 10:30

1 Answers1

-1

You have multiple questions.

The "???" probably came from converting from latin1 to utf8 incorrectly. The data is now lost, since only '?' remains. SELECT HEX(...) ... to confirm that all you get is 3F (?) where you should get something useful.

See "question marks" in Trouble with utf8 characters; what I see is not what I stored .

utf8mb4 and utf8 handle Cyrillic (Russian) identically, so the CHARACTER SET is not the issue with respect to the "???".

If you have an original copy of the data, then probably you want the 3rd item in here -- "CHARACTER SET latin1, but have utf8 bytes in it; leave bytes alone while fixing charset". That is what I call the two-step ALTER.

As for avoiding future issues... See "Best Practice" in my first link. If all you need is European (including Russian), either utf8 or utf8mb4 will suffice. But if you want Emoji or all of Chinese, then go with utf8mb4.

Also, note that you must specify what charset the client is using; this is a common omission, and was probably part of what got you in trouble in the first place.

Rick James
  • 135,179
  • 13
  • 127
  • 222