When I create a new MySQL database through phpMyAdmin, I have the option to choose the collation (e.g.-default, armscii8, ascii, ... and UTF-8). The one I know is UTF-8, since I always see this in HTML source code. But what is the default collation? What are the differences between these choices, and which one should I use?
-
if you want more accurate, use utf8_unicode_ci. refer http://stackoverflow.com/questions/367711/what-is-the-best-collation-to-use-for-mysql-with-php – Jithu.S Jul 19 '12 at 08:22
-
unicode_general_ci is recommended one on the wordpress codex: https://codex.wordpress.org/Installing_WordPress – John Jul 25 '16 at 19:05
3 Answers
Collation tells database how to perform string matching and sorting. It should match your charset.
If you use UTF-8, the collation should be utf8_general_ci
. This will sort in unicode order (case-insensitive) and it works for most languages. It also preserves ASCII and Latin1 order.
The default collation is normally latin1
.
-
7Do not use any of the utf8 collations. They only store up to 3 byte code points. The correct UTF-8 is called utf8mb4 which allows up to 4 bytes and therefore includes emojis. https://mathiasbynens.be/notes/mysql-utf8mb4 – user1318499 May 08 '16 at 02:40
-
2@user1318499 Can you transform your comment into an answer and give more details? – Ortomala Lokni Aug 24 '17 at 07:00
-
1I've forgotten most of that stuff now so I'm not confident writing more but all the information should be in the link in my comment if you want to make it into an answer yourself. – user1318499 Aug 24 '17 at 12:21
Collation is not actually the default, it's giving you the default collation as the first choice.
What we're talking about is collation, or the character set that your database will use in its text types. Your default option is usually based on regional settings, so unless you're planning to globalize, that's usually peachy-keen.
Collations also determine case and accent sensitivity (i.e.-Is 'Big' == 'big'? With a CI, it is). Check out the MySQL list for all the options.

- 92,005
- 12
- 114
- 115
-
-
1UTF-8 is Unicode. If you're not in an English-speaking country, it's a very good bet to use it. – Eric Aug 05 '09 at 04:08
-
4I see. Our project is targeting US and the whole world, so I think it would be better if we use UTF-8, am I right? – bbtang Aug 05 '09 at 04:24
-
-
"Your default option is usually based on regional settings, so unless you're planning to globalize, that's usually peachy-keen." Can't let this slide. He is talking about phpMyAdmin, so websites. How is the World Wide Web not globalized?? If you want to be able to display Chinese characters for example, regional settings are often not ok. Make sure you use Unicode, even if you are in an english region. Your visitors may well come from different countries and it's nice if you can display their name for example. – Stijn de Witt Feb 03 '14 at 19:54
Short answer: always use utf8mb4
(specifically utf8mb4_unicode_ci
) when dealing with collation in MySql & MariaDB.
Long answer:
MySQL’s utf8 encoding is awkwardly named, as it’s different from proper UTF-8 encoding. It doesn’t offer full Unicode support, which can lead to data loss or security vulnerabilities.
Luckily, MySQL 5.5.3 (released in early 2010) introduced a new encoding called utf8mb4 which maps to proper UTF-8 and thus fully supports Unicode.
Read the full text here: https://mathiasbynens.be/notes/mysql-utf8mb4
As to which specific utf8mb
to choose, go with utf8mb4_unicode_ci
so that sorting is always handled properly with minimal/unnoticeable performance drawbacks. See more details here: What's the difference between utf8_general_ci and utf8_unicode_ci

- 354
- 4
- 13