3

Seems like I'm dealing with a fairly common problem. My production MySQL database has a default charset and collation of latin1/latin1_swedish_ci. I neglected to change it early on, and my Rails app has grown to a decent size with an international audience; non-english characters cause exceptions to be thrown while emojis display as question marks.

I know I have a large task ahead of me to convert to utf8mb4/utf8mb4_general_ci. From what I understand if I just run something like this my data could end up compromised.

It seems fairly straightforward to change the database-wide charset/collation like this:

ALTER DATABASE mydbname CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci;

My question is, will changing JUST the database charset/collation immediately affect reads & writes? Or does that just establish what encoding gets applied when new tables are created within the database?

If I want to eventually convert all my tables it seems like step one would be to deal with the database-wide level first... but what side-effects will that have immediately?

DelPiero
  • 489
  • 1
  • 10
  • 21

2 Answers2

1

Yes, changing encoding works immediately, but on the database level it only changes default encoding for new tables, to make it work with existing columns, you have to change encoding on column level. When you do that, you may encounter errors such as lack of disk space (when converting from 1-byte to 2-bytes or 4-bytes you will need more space) or overflow of the max allocated row space per table, if the table has many VARCHAR columns, as MySQL cannot allocate more than 66kB per row, see here.

  • 1
    So the short version is: `ALTER DATABASE` will apply to new tables, `ALTER TABLE CHARACTER SET` will apply to new columns, `ALTER TABLE MODIFY CHARACTER SET` will apply retroactively to all data in the column. – Vladislav Kryshtanovskiy Apr 27 '20 at 16:24
  • Ok, so it's not necessarily true that `ALTER DATABASE` will impact reads/writes immediately. Like you said, it just sets up what new tables will be going forward... yes? – DelPiero Apr 27 '20 at 16:26
  • @DelPiero Yes, sorry if it is not obvious, `ALTER DATABASE` does not impact any reads or writes at all, it only impacts following `CREATE` statements. – Vladislav Kryshtanovskiy Apr 27 '20 at 16:29
  • The 64KB limit is bogus. If you run into something like that, there is a workaround. – Rick James Apr 28 '20 at 05:35
1

Beware of ALTER. If you use the wrong flavor, it will make thing worse.

Here is checklist that leads to which flavor of ALTER to use: http://mysql.rjweb.org/doc.php/charcoll#fixes_for_various_cases

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • That's helpful, thanks. Forgive my beginner question here, but when would things get worse? Right now my tables have `latin1` encoding in various `TEXT` and `VARCHAR` fields. People write mostly in english on the site, but if you were to input Chinese characters for example it throws an error. I did a test to convert one table to `utf8` (`ALTER TABLE mytable CONVERT TO CHARACTER SET utf8`) and it went well, existing text looks good and now Chinese characters work as well. What is the danger? – DelPiero Apr 28 '20 at 05:02
  • @DelPiero - You see how many different cases are there? It took me a looking time to discover all those cases and figure them out. I still worry that some cases are missing. Those are only cases were a _single_ mistake was made. If you have messed up characters and apply the wrong `ALTER`, now there are _two_ errors. The list of fixes grows quadratically! Usually I point people to this companion essay: https://stackoverflow.com/questions/38363566/trouble-with-utf8-characters-what-i-see-is-not-what-i-stored – Rick James Apr 28 '20 at 05:23
  • @DelPiero - And... My favorite diagnostic is to get the HEX from a sample. I can quickly determine whether it is 3-byte Chinese or 4-byte Chinese, or garbled "double encoding", etc. Note: utf8 is limited to 3-byte Chinese; you should use utf8mb4. – Rick James Apr 28 '20 at 05:26
  • From what I understand from utf8 to utf8mb4 `VARCHAR` length 255 doesn't work? – DelPiero Apr 28 '20 at 05:29
  • @DelPiero - For MySQL 5.5 and 5.6, you need a workaround when indexing such a column: http://mysql.rjweb.org/doc.php/limits#767_limit_in_innodb_indexes – Rick James Apr 28 '20 at 05:36
  • But not for MySQL 5.7? I fired up a test Rails app with MySQL 5.7.xx and `utf8mb4` encoding right from the start, and when you run a db migration and create a table `VARCHAR`s look to be 255 by default like they always were... – DelPiero Apr 29 '20 at 19:21
  • @DelPiero - Correct. In 5.7, MySQL finally cleaned up the 191 mess that was introduced in 5.5. – Rick James Apr 30 '20 at 00:30