Short Answer: Yes, if you are only using 3-byte (or shorter) UTF-8 characters.
Or... No if you intend to work with 4-byte UTF-8 characters such as .
Long Answer:
(And I will address why "no" could be the right answer.)
The connection establishes what encoding the client is using.
The CHARACTER SET
on a column (or, by default, from the table) establishes what encoding can be put into the column.
CHARACTER SET utf8
is a subset of utf8mb4
. That is, all characters acceptable to utf8
(via connection or column) are acceptable to utf8mb4
. Phrased another way, MySQL's utf8mb4
(same as the outside world's UTF-8
) have the full 4-byte utf-8 encoding that includes more Emoji, more Chinese, etc, than MySQL up-to-3-byte utf8
(aka "BMP")
(Technically, utf8mb4
only handles up to 4 bytes, but UTF-8
handles longer characters. However, I doubt if 5-byte characters will happen in my lifetime.)
So, here's what happens with any 3-byte (or shorter) UTF-8 character in the client, given that the Connection is utf8mb4 and the columns in the tables are only utf8: Every character goes into and comes out of the server without transformation and without errors. Note: The problem occurs on INSERT
, not on SELECT
; however you may not notice the problem until you do a SELECT
.
But, what if you have an Emoji in the client? Now you will get an error. (Or a truncated string) (Or question mark(s)) This is because the 4-byte Emoji (eg, ) cannot be squeezed into the 3-byte "utf8" (or "1-byte latin1" or ...).
If you are running 5.5 or 5.6, you may run into the 767 (or 191) problem. I provide several workarounds in here. None is perfect.
As for inverting (utf8 connection but utf8mb4 columns): The SELECT
can have trouble if you manage to get some 4-byte characters into the table.
"Official sources" -- Good luck. I have spent a decade trying to tease out the ins and outs of character handling, and then simplify it into actionable sentences. Much of that time was thinking I had all the answers, only to encounter yet another failing test case. The common cases are listed in Trouble with UTF-8 characters; what I see is not what I stored . However, that does not directly address your question!
From Comment
mysql> SHOW CREATE TABLE emoji\G
*************************** 1. row ***************************
Table: emoji
Create Table: CREATE TABLE `emoji` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`text` varchar(255) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=utf8mb4
1 row in set (0.00 sec)
mysql> insert into emoji (text) values ("abc");
Query OK, 1 row affected (0.01 sec)
mysql> show variables like 'char%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
The above says that the "connection" (think "client") is using utf8, not utf8mb4.
mysql> insert into emoji (text) values (""); -- 4-byte Emoji
Query OK, 1 row affected, 1 warning (0.00 sec)
mysql> show warnings;
+---------+------+----------------------------------------------------------------------------------+
| Level | Code | Message |
+---------+------+----------------------------------------------------------------------------------+
| Warning | 1366 | Incorrect string value: '\xF0\x9F\x98\x85\xF0\x9F...' for column 'text' at row 1 |
+---------+------+----------------------------------------------------------------------------------+
1 row in set (0.00 sec)
Now, change the 'connection' to utf8mb4
:
mysql> SET NAMES utf8mb4;
Query OK, 0 rows affected (0.00 sec)
mysql> insert into emoji (text) values ("");
Query OK, 1 row affected (0.01 sec)
mysql> SELECT * FROM emoji;
+----+--------------+
| id | text |
+----+--------------+
| 1 | ? ? ? ? |
| 2 | abc |
| 3 | ???????????? | -- from when "utf8" was in use
| 4 | | -- Success with utf8mb4 in use
+----+--------------+
4 rows in set (0.01 sec)