2

In a large system based on Mysql 5.5.57 Php 5.6.37 setup

Currently the whole system is working in utf8 including SET NAMES utf8 at the beginning of each db connection.

I need to support emojis in one of the tables so I need to switch it to utf8mb4. I don't want to switch other tables.

My question is - if I change to SET NAMES utf8mb4 for all connections (utf8 and utf8mb4) and switch the specific table only to utf8mb4 (and only write mb4 data to this table). Will the rest of the system work as before?

Can there be any issue from working with SET NAMES utf8mb4 in the utf8 tables/data/connections?

Nir
  • 24,619
  • 25
  • 81
  • 117
  • 2
    Possible duplicate of [UTF-8 all the way through](https://stackoverflow.com/questions/279170/utf-8-all-the-way-through) – Madhur Bhaiya Nov 02 '18 at 12:40
  • not a duplicate. I'm asking about the hybrid usage of utf8 and utf8mb4 – Nir Nov 02 '18 at 13:14
  • Personally, I'd recommend you to switch to utf8mb4 completely. Even MySQL team is claiming that utf8mb4 is now significantly faster than utf8. Ref: http://mysqlserverteam.com/mysql-8-0-when-to-use-utf8mb3-over-utf8mb4/ PS: not my downvote :-) – Madhur Bhaiya Nov 02 '18 at 13:20
  • 1
    PSA: Probably time to update to 5.6.38 since there's a [vulnerability in 5.6.37](http://php.net/ChangeLog-5.php#5.6.38). – tadman Nov 02 '18 at 17:32
  • What you should be asking is "If this works, do I want this to be the way the system works going forward? Would it be better to just get off of utf8 and get on utf8mb4? How will development efforts and security suffer if I implement a hybrid scheme? What are the stakes? What is the worst case scenario?" – Anthony Rutledge Nov 03 '18 at 03:25
  • The reason I don't want to switch all my tables is that it requires table optimize - this is a large production system. Not happy with such blocking operations. Doing it to one table is way more feasible than to switch everything – Nir Nov 04 '18 at 10:21

1 Answers1

3

I think there should no problem using SET NAMES utf8mb4 for all connections.

(utf8mb3 is a synonym of utf8 in MySQL; I'll use the former for clarity.)

utf8mb3 is a subset of utf8mb4, so your client's bytes will be happy either way (except for Emoji, which needs utf8mb4). When the bytes get to (or come from) a column that is declared only there will be a check to verify that you are not storing Emoji or certain Chinese characters, but otherwise, it goes through with minimal fuss.

I suggest

 ALTER TABLE ... CONVERT TO utf8mb4

as the 'right' way to convert a table. However, it converts all varchar/text columns. This may be bad...

If you JOIN a converted table to an unconverted table, then you will be trying to compare a utf8mb3 string to a utf8mb4 string. MySQL will throw up its hands and convert all rows from one to the other. That is no INDEX will be useful.

So... Be sure to at least be consistent about any columns that are involved in JOINs.

Rick James
  • 135,179
  • 13
  • 127
  • 222