791

Is there a collation type which is officially recommended by MySQL, for a general website where you aren't 100% sure of what will be entered? I understand that all the encodings should be the same, such as MySQL, Apache, the HTML and anything inside PHP.

In the past I have set PHP to output in "UTF-8", but which collation does this match in MySQL? I'm thinking it's one of the UTF-8 ones, but I have used utf8_unicode_ci, utf8_general_ci, and utf8_bin before, and I do not know which of these "utf8" maps to, or if that is the best to use.

SamGoody
  • 13,758
  • 9
  • 81
  • 91
Darryl Hein
  • 142,451
  • 95
  • 218
  • 261
  • 45
    Side note: MySQL's "utf8" isn't proper UTF-8 (no support for 4+ byte Unicode characters like ), however "utf8mb4" is. With utf8, a field will be truncated on insert starting with the first unsupported Unicode character. http://mathiasbynens.be/notes/mysql-utf8mb4 – basic6 Apr 27 '14 at 17:47
  • 8
    I wonder if we'll ever need 5 bytes for all those emojis... *sigh* – Álvaro González Jul 13 '15 at 09:43
  • 2
    Related question: https://stackoverflow.com/questions/38228335/which-mysql-collation-exactly-matches-phps-string-comparison "Which MySQL collation exactly matches PHP's string comparison?" – William Entriken Jul 06 '16 at 15:53
  • For an overview of the sane options: https://www.monolune.com/mysql-utf8-charsets-and-collations-explained/ – Flux Feb 15 '18 at 23:16
  • 1
    **2003 Update**: There is indeed a officially recommended encoding, **utf8mb4**, and collation: for v5.7+ **utf8mb4_unicode_520_ci** and for v8+ **utf8mb4_0900_ai_ci**. I would like to write a more complete answer, but this question is closed as opinion based, despite being IMO objective and useful (over 2K Q/A upvotes despite being closed!) – SamGoody Mar 14 '23 at 11:57
  • @SamGoody - (Perhaps you meant 2023?) "Reopening" is hard to achieve, especially after a review rejected such. Since this Question is 15 years old, it would be better to open a new Question, being sure to use the terms "character set" and "collation" correctly. – Rick James Mar 28 '23 at 01:18

11 Answers11

670

The main difference is sorting accuracy (when comparing characters in the language) and performance. The only special one is utf8_bin which is for comparing characters in binary format.

utf8_general_ci is somewhat faster than utf8_unicode_ci, but less accurate (for sorting). The specific language utf8 encoding (such as utf8_swedish_ci) contain additional language rules that make them the most accurate to sort for those languages. Most of the time I use utf8_unicode_ci (I prefer accuracy to small performance improvements), unless I have a good reason to prefer a specific language.

You can read more on specific unicode character sets on the MySQL manual - http://dev.mysql.com/doc/refman/5.0/en/charset-unicode-sets.html

Overflowh
  • 1,103
  • 6
  • 18
  • 40
Eran Galperin
  • 86,251
  • 24
  • 115
  • 132
  • 4
    small performance improvements ? are you sure about this ? http://publib.boulder.ibm.com/infocenter/db2luw/v9r5/index.jsp?topic=/com.ibm.db2.luw.admin.nls.doc/doc/c0053263.html The collation you choose can significantly impact the performance of queries in the database. – Adam Ramadhan Aug 07 '10 at 07:54
  • 65
    This is for DB2 not MySQL. Also, there are no concrete numbers or benchmarks so you are just basing it on the opinion of the writer. – Eran Galperin Aug 09 '10 at 12:14
  • 3
    Note that if you want to use functions, there is a bug in MySQL (most currently distributed versions) where functions always return the string using utf8_general_ci, causing problems if you're using another collation for your strings - see http://bugs.mysql.com/bug.php?id=24690 – El Yobo Feb 09 '11 at 10:49
  • 1
    From my experience with different locales I'd always use `utf8_unicode_*` – Shiplu Mokaddim Dec 17 '12 at 22:51
  • 1
    See also: http://stackoverflow.com/questions/2344118/utf-8-general-bin-unicode – Costa Apr 10 '13 at 03:38
  • @ElYobo Your remark is important. It has caused me many pains in the past. In that scenario I usually have to specify that It needs to use the particular (utf8_swedish_ci) one that I'm using. This is also true for stored procedures. – Manatax Aug 16 '13 at 22:43
  • 16
    Update: For newer versions, recommend `utf8mb4` and `utf8mb4_unicode_520_ci`. These give you the rest of Chinese, plus improved collation. – Rick James Mar 04 '16 at 23:30
  • @RickJames can you be specific on what 'newer versions' means? – TKoL May 21 '19 at 09:55
  • 1
    @TKoL - 5.5 introduced `utf8mb4`; 5.7 fixed some kinks; 8.0 made it the default and improved the collations. – Rick James May 21 '19 at 15:49
  • @RickJames, there is a newer update (2023: Encoding: utf8mb4. Collation: for v5.7+ utf8mb4_unicode_520_ci and for v8+ utf8mb4_0900_ai_ci. See my comment on the question. (And vote to reopen if you have the rights.) – SamGoody Mar 14 '23 at 12:13
152

Actually, you probably want to use utf8_unicode_ci or utf8_general_ci.

  • utf8_general_ci sorts by stripping away all accents and sorting as if it were ASCII
  • utf8_unicode_ci uses the Unicode sort order, so it sorts correctly in more languages

However, if you are only using this to store English text, these shouldn't differ.

SiHa
  • 7,830
  • 13
  • 34
  • 43
Vegard Larsen
  • 12,827
  • 14
  • 59
  • 102
  • 2
    I like your explaination! Good one. But I need better understanding on exactly why unicode sort order is better way to sort correctly than stripping away accents. – weia design Jun 05 '13 at 13:23
  • 15
    @Adam It really depends on your target audience. Sorting is a tricky problem to localize correctly. E.g. in Norwegian the letters Æ Ø Å are the last 3 of the alphabet. With utf8_general_ci, Ø and Å gets converted to O and A, which puts them in the completely wrong position when sorted (I am not sure how Æ is handled, as it is a ligature, not an accented character). This sort order is different in almost any language, e.g. Norwegian and Swedish have different orders (and slightly different letters which are considered equal): Æ Ø Å is sorted Å Æ Ø (actual letters are Å Ä Ö). Unicode fixes this. – Vegard Larsen Jun 06 '13 at 06:18
  • So what I am basically saying, is that you should probably use a language-specific sort if you can, but in most cases that is unfeasible, so go for Unicode general sorting. It will still be strange in some language, but more correct than ASCII. – Vegard Larsen Jun 06 '13 at 06:19
  • Wouldn't this also affect how data is stored? English only is not really enough this days. – Manatax Aug 16 '13 at 22:47
  • 3
    @Manatax - with any of the utf8_ collations, the data is stored as utf8. The collation is just about what characters are considered equal, and how they're ordered. – frymaster Oct 29 '13 at 11:55
  • 2
    @frymaster - not true, as per: http://mathiasbynens.be/notes/mysql-utf8mb4 "MySQL’s utf8 only allows you to store 5.88% of all possible Unicode code points" – data Jun 17 '14 at 08:39
  • The link is correct, but that doesn't mean anything I said isn't true. – frymaster Jun 17 '14 at 13:27
  • "if you are only using this to store English text, these shouldn't differ." That's a slightly *naïve* overgeneralization ;) In other words, not even an all-English text is guaranteed to be ASCII. (Yes, this whole comment *is* in English.) – Piskvor left the building Nov 09 '15 at 13:31
  • @Piskvor Note that the only two collations I mentioned where both UTF-8, and not ASCII... :) – Vegard Larsen Nov 09 '15 at 17:28
  • @VegardLarsen: Indeed. However, one would sort "naive, naïve, name", the other "naive, name, naïve" (as the `ï` codepoint is further down the table than `m`). – Piskvor left the building Nov 10 '15 at 07:59
125

Be very, very aware of this problem that can occur when using utf8_general_ci.

MySQL will not distinguish between some characters in select statements, when utf8_general_ci collation is used. This can lead to very nasty bugs - especially for example, where usernames are involved. Depending on the implementation that uses the database tables, this problem could allow malicious users to create a username matching an administrator account.

This problem exposes itself at the very least in early 5.x versions - I'm not sure if this behaviour has changed later.

I'm not a DBA, but to avoid this problem, I always go with utf8-bin instead of a case-insensitive one.

The script below describes the problem by example.

-- first, create a sandbox to play in
CREATE DATABASE `sandbox`;
use `sandbox`;

-- next, make sure that your client connection is of the same 
-- character/collate type as the one we're going to test next:
charset utf8 collate utf8_general_ci

-- now, create the table and fill it with values
CREATE TABLE `test` (`key` VARCHAR(16), `value` VARCHAR(16) )
    CHARACTER SET utf8 COLLATE utf8_general_ci;

INSERT INTO `test` VALUES ('Key ONE', 'value'), ('Key TWO', 'valúe');

-- (verify)
SELECT * FROM `test`;

-- now, expose the problem/bug:
SELECT * FROM test WHERE `value` = 'value';

--
-- Note that we get BOTH keys here! MySQLs UTF8 collates that are 
-- case insensitive (ending with _ci) do not distinguish between 
-- both values!
--
-- collate 'utf8_bin' doesn't have this problem, as I'll show next:
--

-- first, reset the client connection charset/collate type
charset utf8 collate utf8_bin

-- next, convert the values that we've previously inserted in the table
ALTER TABLE `test` CONVERT TO CHARACTER SET utf8 COLLATE utf8_bin;

-- now, re-check for the bug
SELECT * FROM test WHERE `value` = 'value';

--
-- Note that we get just one key now, as you'd expect.
--
-- This problem appears to be specific to utf8. Next, I'll try to 
-- do the same with the 'latin1' charset:
--

-- first, reset the client connection charset/collate type
charset latin1 collate latin1_general_ci

-- next, convert the values that we've previously inserted
-- in the table
ALTER TABLE `test` CONVERT TO CHARACTER SET latin1 COLLATE latin1_general_ci;

-- now, re-check for the bug
SELECT * FROM test WHERE `value` = 'value';

--
-- Again, only one key is returned (expected). This shows 
-- that the problem with utf8/utf8_generic_ci isn't present 
-- in latin1/latin1_general_ci
--
-- To complete the example, I'll check with the binary collate
-- of latin1 as well:

-- first, reset the client connection charset/collate type
charset latin1 collate latin1_bin

-- next, convert the values that we've previously inserted in the table
ALTER TABLE `test` CONVERT TO CHARACTER SET latin1 COLLATE latin1_bin;

-- now, re-check for the bug
SELECT * FROM test WHERE `value` = 'value';

--
-- Again, only one key is returned (expected).
--
-- Finally, I'll re-introduce the problem in the exact same 
-- way (for any sceptics out there):

-- first, reset the client connection charset/collate type
charset utf8 collate utf8_generic_ci

-- next, convert the values that we've previously inserted in the table
ALTER TABLE `test` CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;

-- now, re-check for the problem/bug
SELECT * FROM test WHERE `value` = 'value';

--
-- Two keys.
--

DROP DATABASE sandbox;
Channa
  • 742
  • 17
  • 28
Guus
  • 2,986
  • 2
  • 21
  • 32
  • 37
    -1: This is surely remedied by applying a unique key to the relevant column. You would see the same behaviour if the two values were `'value'` and `'valUe'`. The whole point of a collation is that it provides rules for (among other things) when two strings are considered equal to one another. – Hammerite Jun 09 '11 at 10:26
  • 14
    That's exactly the problem that I'm trying to illustrate - the collation makes two things equal while they in fact are not intended to be equal at all (and thus, a unique constraint is exactly the opposite of what you'd want to achive) – Guus Aug 10 '11 at 19:49
  • 19
    But you describe it as a "problem" and leading to "bugs" when the behaviour is exactly what a collation is intended to achieve. Your description is correct, but only in as much as it is an error on the part of the DBA to select an inappropriate collation. – Hammerite Aug 11 '11 at 15:42
  • 33
    The thing is that, when you enter two usernames that are considered equal by the collation, it will not be allowed if you set the coloumn username to be unique, which you should of course do! – Student of Hogwarts Dec 01 '12 at 10:54
  • 12
    I upvoted both this answer and @Hammerite's comment, because both of them combined helped me reach an understanding of collation. – Nacht Jun 25 '15 at 01:45
  • In fact, instead of telling the db that `'value'` and `'vaLue'` should be considered the same and then disallowing equal values in that column, what you probably want is turning off those collation features at all by setting the collation to `utf8_bin`. Here only equal values are regarded the same. – Conic Jul 30 '18 at 14:43
  • 1
    I might be wrong, but isn't collation only about _sorting_, *not* uniqueness? `value` and `VaLuE` might be be the same for _sorting purposes only_, but I really doubt a collation will affect uniqueness of values. – MestreLion Jan 15 '19 at 10:57
112

It is best to use character set utf8mb4 with the collation utf8mb4_unicode_ci.

The character set, utf8, only supports a small amount of UTF-8 code points, about 6% of possible characters. utf8 only supports the Basic Multilingual Plane (BMP). There 16 other planes. Each plane contains 65,536 characters. utf8mb4 supports all 17 planes.

MySQL will truncate 4 byte UTF-8 characters resulting in corrupted data.

The utf8mb4 character set was introduced in MySQL 5.5.3 on 2010-03-24.

Some of the required changes to use the new character set are not trivial:

  • Changes may need to be made in your application database adapter.
  • Changes will need to be made to my.cnf, including setting the character set, the collation and switching innodb_file_format to Barracuda
  • SQL CREATE statements may need to include: ROW_FORMAT=DYNAMIC
    • DYNAMIC is required for indexes on VARCHAR(192) and larger.

NOTE: Switching to Barracuda from Antelope, may require restarting the MySQL service more than once. innodb_file_format_max does not change until after the MySQL service has been restarted to: innodb_file_format = barracuda.

MySQL uses the old Antelope InnoDB file format. Barracuda supports dynamic row formats, which you will need if you do not want to hit the SQL errors for creating indexes and keys after you switch to the charset: utf8mb4

  • #1709 - Index column size too large. The maximum column size is 767 bytes.
  • #1071 - Specified key was too long; max key length is 767 bytes

The following scenario has been tested on MySQL 5.6.17: By default, MySQL is configured like this:

SHOW VARIABLES;

innodb_large_prefix = OFF
innodb_file_format = Antelope

Stop your MySQL service and add the options to your existing my.cnf:

[client]
default-character-set= utf8mb4

[mysqld]
explicit_defaults_for_timestamp = true
innodb_large_prefix = true
innodb_file_format = barracuda
innodb_file_format_max = barracuda
innodb_file_per_table = true

# Character collation
character_set_server=utf8mb4
collation_server=utf8mb4_unicode_ci

Example SQL CREATE statement:

CREATE TABLE Contacts (
 id INT AUTO_INCREMENT NOT NULL,
 ownerId INT DEFAULT NULL,
 created timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
 modified timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
 contact VARCHAR(640) NOT NULL,
 prefix VARCHAR(128) NOT NULL,
 first VARCHAR(128) NOT NULL,
 middle VARCHAR(128) NOT NULL,
 last VARCHAR(128) NOT NULL,
 suffix VARCHAR(128) NOT NULL,
 notes MEDIUMTEXT NOT NULL,
 INDEX IDX_CA367725E05EFD25 (ownerId),
 INDEX created (created),
 INDEX modified_idx (modified),
 INDEX contact_idx (contact),
 PRIMARY KEY(id)
) DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci ENGINE = InnoDB ROW_FORMAT=DYNAMIC;
  • You can see error #1709 generated for INDEX contact_idx (contact) if ROW_FORMAT=DYNAMIC is removed from the CREATE statement.

NOTE: Changing the index to limit to the first 128 characters on contacteliminates the requirement for using Barracuda with ROW_FORMAT=DYNAMIC

INDEX contact_idx (contact(128)),

Also note: when it says the size of the field is VARCHAR(128), that is not 128 bytes. You can use have 128, 4 byte characters or 128, 1 byte characters.

This INSERT statement should contain the 4 byte 'poo' character in the 2 row:

INSERT INTO `Contacts` (`id`, `ownerId`, `created`, `modified`, `contact`, `prefix`, `first`, `middle`, `last`, `suffix`, `notes`) VALUES
(1, NULL, '0000-00-00 00:00:00', '2014-08-25 03:00:36', '1234567890', '12345678901234567890', '1234567890123456789012345678901234567890', '1234567890123456789012345678901234567890', '12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678', '', ''),
(2, NULL, '0000-00-00 00:00:00', '2014-08-25 03:05:57', 'poo', '12345678901234567890', '', '', '', '', ''),
(3, NULL, '0000-00-00 00:00:00', '2014-08-25 03:05:57', 'poo', '12345678901234567890', '', '', '123', '', '');

You can see the amount of space used by the last column:

mysql> SELECT BIT_LENGTH(`last`), CHAR_LENGTH(`last`) FROM `Contacts`;
+--------------------+---------------------+
| BIT_LENGTH(`last`) | CHAR_LENGTH(`last`) |
+--------------------+---------------------+
|               1024 |                 128 | -- All characters are ASCII
|               4096 |                 128 | -- All characters are 4 bytes
|               4024 |                 128 | -- 3 characters are ASCII, 125 are 4 bytes
+--------------------+---------------------+

In your database adapter, you may want to set the charset and collation for your connection:

SET NAMES 'utf8mb4' COLLATE 'utf8mb4_unicode_ci'

In PHP, this would be set for: \PDO::MYSQL_ATTR_INIT_COMMAND

References:

Jeremy Postlethwaite
  • 1,204
  • 1
  • 10
  • 6
  • There are more details about utf8mb4 on [MySQL 5.6 Reference Manual: 10.1.10.7 The utf8mb4 Character Set (4-Byte UTF-8 Unicode Encoding)](http://dev.mysql.com/doc/refman/5.6/en/charset-unicode-utf8mb4.html) – Jeremy Postlethwaite Aug 24 '14 at 21:01
  • More information about [Wikipedia: Unicode planes](http://en.wikipedia.org/wiki/Plane_(Unicode)) – Jeremy Postlethwaite Aug 24 '14 at 21:07
  • 7
    utf8mb4_unicode_ci should absolutely be the recommended collation for new projects in 2015. – Trevor Gehman Jul 07 '15 at 16:44
  • 8
    Update... `utf8mb4_unicode_520_ci` is better. In the future, there will be `utf8mb4_unicode_800_ci` (or something like that), as MySQL catches up with the Unicode standards. – Rick James Apr 29 '16 at 04:17
  • @JeremyPostlethwaite `utf8mb4_unicode_ci` is not capable to store `Albanian` language keywords eg. `Politika e privatësisë`. `utf8mb4_unicode_ci` stores `Politika e privatësisë` instead of `Politika e privatësisë` in the database table field. Not useful for me. Looking another best solution. Thanks dear. – Kamlesh Sep 12 '21 at 08:22
  • @Kamlesh - That is "Mojibake", which is caused by a mismatch one _character set_. The _collation_ is not relevant. See https://stackoverflow.com/questions/38363566/trouble-with-utf8-characters-what-i-see-is-not-what-i-stored – Rick James Mar 28 '23 at 01:14
49

Collations affect how data is sorted and how strings are compared to each other. That means you should use the collation that most of your users expect.

Example from the documentation for charset unicode:

utf8_general_ci also is satisfactory for both German and French, except that ‘ß’ is equal to ‘s’, and not to ‘ss’. If this is acceptable for your application, then you should use utf8_general_ci because it is faster. Otherwise, use utf8_unicode_ci because it is more accurate.

So - it depends on your expected user base and on how much you need correct sorting. For an English user base, utf8_general_ci should suffice, for other languages, like Swedish, special collations have been created.

surfmuggle
  • 5,527
  • 7
  • 48
  • 77
Tomalak
  • 332,285
  • 67
  • 532
  • 628
  • 1
    i was using utf8_general_ci and it took a couple of second while sorting and armscii_general_ci did it extremely quick.Why this happened?One more Question , What do you think which collation is used by social networking sites –  May 04 '12 at 15:27
23

Essentially, it depends on how you think of a string.

I always use utf8_bin because of the problem highlighted by Guus. In my opinion, as far as the database should be concerned, a string is still just a string. A string is a number of UTF-8 characters. A character has a binary representation so why does it need to know the language you're using? Usually, people will be constructing databases for systems with the scope for multilingual sites. This is the whole point of using UTF-8 as a character set. I'm a bit of a pureist but I think the bug risks heavily outweigh the slight advantage you may get on indexing. Any language related rules should be done at a much higher level than the DBMS.

In my books "value" should never in a million years be equal to "valúe".

If I want to store a text field and do a case insensitive search, I will use MYSQL string functions with PHP functions such as LOWER() and the php function strtolower().

Phil
  • 231
  • 2
  • 2
  • 10
    If binary comparison of strings is your desired comparison, then of course you should use the binary collation; but to dismiss alternative collations as a "bug risk" or being simply for convenience of indexing suggests that you do not fully understand the point of a collation. – Hammerite Jun 09 '11 at 10:32
13

The accepted answer fairly definitively suggests using utf8_unicode_ci, and whilst for new projects that's great, I wanted to relate my recent contrary experience just in case it saves anyone some time.

Because utf8_general_ci is the default collation for Unicode in MySQL, if you want to use utf8_unicode_ci then you end up having to specify it in a lot of places.

For example, all client connections not only have a default charset (makes sense to me) but also a default collation (i.e. the collation will always default to utf8_general_ci for unicode).

Likely, if you use utf8_unicode_ci for your fields, your scripts that connect to the database will need to be updated to mention the desired collation explicitly -- otherwise queries using text strings can fail when your connection is using the default collation.

The upshot is that when converting an existing system of any size to Unicode/utf8, you may end up being forced to use utf8_general_ci because of the way MySQL handles defaults.

George Lund
  • 1,269
  • 12
  • 16
13

For UTF-8 textual information, you should use utf8_general_ci because...

  • utf8_bin: compare strings by the binary value of each character in the string

  • utf8_general_ci: compare strings using general language rules and using case-insensitive comparisons

a.k.a. it will should making searching and indexing the data faster/more efficient/more useful.

mepcotterell
  • 2,670
  • 2
  • 21
  • 28
8

For the case highlighted by Guus, I would strongly suggest using either utf8_unicode_cs (case sensitive, strict matching, ordering correctly for the most part) instead of utf8_bin (strict matching, incorrect ordering).

If the field is intended to be searched, as opposed to matched for a user, then use utf8_general_ci or utf8_unicode_ci. Both are case-insensitive, one will losely match (‘ß’ is equal to ‘s’, and not to ‘ss’). There are also language specific versions, like utf8_german_ci where the lose matching is more suitable for the language specified.

[Edit - nearly 6 years later]

I no longer recommend the "utf8" character set on MySQL, and instead recommend the "utf8mb4" character set. They match almost entirely, but allow for a little (lot) more unicode characters.

Realistically, MySQL should have updated the "utf8" character set and respective collations to match the "utf8" specification, but instead, a separate character set and respective collations as to not impact storage designation for those already using their incomplete "utf8" character set.

SEoF
  • 1,092
  • 14
  • 26
  • 5
    FYI: `utf8_unicode_cs` doesn't not exist. The only case-sensitive utf8 is `utf8_bin`. Problem is `utf8_bin` sorting is incorrect. See: http://stackoverflow.com/questions/15218077/does-a-utf8-unicode-cs-collation-exist/15916398#15916398 – Costa Apr 10 '13 at 03:35
  • 1
    Thanks for updating! – Hashim Aziz May 13 '19 at 18:39
5

I found these collation charts helpful. http://collation-charts.org/mysql60/. I'm no sure which is the used utf8_general_ci though.

For example here is the chart for utf8_swedish_ci. It shows which characters it interprets as the same. http://collation-charts.org/mysql60/mysql604.utf8_swedish_ci.html

jiv-e
  • 483
  • 6
  • 8
2

In your database upload file, add the followin line before any line:

SET NAMES utf8;

And your problem should be solved.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
tapos ghosh
  • 2,114
  • 23
  • 37
  • 2
    Read a question: In the past I have set PHP to output in "UTF-8", but which collation does this match in MySQL? I'm thinking it's one of the UTF-8 ones, but I have used utf8_unicode_ci, utf8_general_ci, and utf8_bin before. – Jitesh Sojitra Aug 09 '16 at 08:45
  • 5
    This answer has nothing to do with the question. Additionally, issuing a `SET NAMES` query directly doesn't let the client know about the encoding and may break certain features such as prepared statements in a very subtle way. – Álvaro González Sep 30 '16 at 09:20