123

One of the responses to a question I asked yesterday suggested that I should make sure my database can handle UTF-8 characters correctly. How I can do this with MySQL?

Community
  • 1
  • 1
Ben
  • 66,838
  • 37
  • 84
  • 108
  • 4
    I really hope we get a comprehensive answer, covering various MySQL versions, incompatibilities, etc. – Edward Z. Yang Oct 15 '08 at 05:03
  • See also http://stackoverflow.com/questions/2344118/utf-8-general-bin-unicode – tripleee Jun 23 '15 at 10:04
  • 1
    @EdwardZ.Yang -- MySQL 4.1 introduced `CHARACTER SETs`; 5.1.24 messed with the collation of German sharp-s (ß), which was rectified by adding another collation in 5.1.62 (arguably making things worse); 5.5.3 filled out utf8 with the new charset utf8mb4. – Rick James Jan 20 '16 at 03:36
  • 1
    This question is quite same to this one.. Please look at that http://stackoverflow.com/questions/3513773/change-mysql-default-character-set-to-utf-8-in-my-cnf – Nyein Aung Jan 25 '16 at 06:44
  • 1
    It's worth pointing out that most of these answers are just plain wrong. Do not use `utf8`. It only supports up to 3-byte characters. The correct character set you should use in MySQL is `utf8mb4`. – SineSwiper Feb 11 '17 at 22:12

15 Answers15

116

Update:

Short answer - You should almost always be using the utf8mb4 charset and utf8mb4_unicode_ci collation.

To alter database:

ALTER DATABASE dbname CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

See:

Original Answer:

MySQL 4.1 and above has a default character set of UTF-8. You can verify this in your my.cnf file, remember to set both client and server (default-character-set and character-set-server).

If you have existing data that you wish to convert to UTF-8, dump your database, and import it back as UTF-8 making sure:

  • use SET NAMES utf8 before you query/insert into the database
  • use DEFAULT CHARSET=utf8 when creating new tables
  • at this point your MySQL client and server should be in UTF-8 (see my.cnf). remember any languages you use (such as PHP) must be UTF-8 as well. Some versions of PHP will use their own MySQL client library, which may not be UTF-8 aware.

If you do want to migrate existing data remember to backup first! Lots of weird choping of data can happen when things don't go as planned!

Some resources:

Hendy Irawan
  • 20,498
  • 11
  • 103
  • 114
Owen
  • 82,995
  • 21
  • 120
  • 115
  • 33
    My understanding is that `utf8` within MySQL only refers to a small subset of full Unicode. You should use `utf8mb4` instead to force full support. See http://mathiasbynens.be/notes/mysql-utf8mb4 "For a long time, I was using MySQL’s utf8 charset for databases, tables, and columns, *assuming* it mapped to the UTF-8 encoding described above." – Aaron McDaid Sep 30 '13 at 09:32
  • 9
    MySQL has never had a default character set of UTF-8. 4.1 and 5.x up to the latest 5.7 all use `latin1` and `latin1_swedish_ci` for the default charset and collation. See the "Server Character Set and Collation" page in the MySQL manual for confirmation: https://dev.mysql.com/doc/refman/5.1/en/charset-server.html – Animism Jan 11 '14 at 15:57
  • *(Tim's comment has disappeared! But I think my response here might still be useful to some. Here it is: )* According to [Wikipedia](http://en.wikipedia.org/wiki/UTF-8) the 5- and 6- bytes encodings have been removed. They were never actually used. Unicode never did define a range of characters that used the 5- or 6- byte encodings. I think [this email](http://www.unicode.org/mail-arch/unicode-ml/Archives-Old/UML018/0330.html) explains it best – Aaron McDaid May 21 '14 at 13:43
  • @AaronMcDaid Yeah, I deleted it after I read that wikipedia article. ;) – Tim Tisdall May 21 '14 at 13:48
  • The most annoying aspect of utf8 in mysql is that 3 or 4 bytes have to be reserved for every possible character in your table. So, `utf8mb4` means you can only save 1/4 the amount of regular English text as `latin1` if you're using the maximum row length. I think that's why a lot of programs have given up and used BINARY if they have a lot of text to save. (wikipedia is an example) – Tim Tisdall May 21 '14 at 13:55
  • 2
    @TimTisdall You need not worry `utf8mb4` taking extra storage when most text is ASCII. Although `char` strings are preallocated, `varchar` strings are not -- see the last few lines on [this documentation page](http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html). For example, `char(10)` will be pessimistically reserve 40 bytes under utf8mb4, but `varchar(10)` will allocate bytes in keeping with the variable length encoding. – Kevin A. Naudé Oct 22 '14 at 21:15
  • 1
    @Kevin I think you misread that. I think the maximum row length is 64k. You can only make a utf8mb4 field 1/4 of that because it had to reserve that amount of space. So, even if it's ASCII you can only insert 16k characters. – Tim Tisdall Oct 22 '14 at 22:24
  • 1
    @TimTisdall Oh, you're talking about upper bounds. Yes, those are lower. Fortunately, current versions of mysql will automatically upgrade from `varchar(n)` to the `text` data type if you attempt to alter a `varchar(n)` field to larger than the feasible byte size (while issuing a warning). An index will also have a lower worst-case upper bound, and that may present other problems. – Kevin A. Naudé Oct 23 '14 at 08:01
  • 1
    The statement "... a default character set of UTF-8" is incorrect, according to MySQL's own documentation: http://dev.mysql.com/doc/refman/4.1/en/charset-unicode-utf8.html - it does not allow 4-byte characters at all. – T.W.R. Cole Dec 29 '15 at 17:40
  • Do not use `utf8`. Use `utf8mb4`. Converting a character set is already costly in terms of table locking and CPU time. Don't bother with an inferior character set that doesn't support all of the characters. – SineSwiper Feb 11 '17 at 22:09
  • Check if MySQL is currently using utf8[mb4] or latin1 using SHOW VARIABLES, e.g., `SHOW VARIABLES WHERE Value LIKE 'latin1%';`. Here's another post on [how to configure utf8mb4 in my.cnf](https://stackoverflow.com/a/24487309/664132). – basic6 Sep 05 '18 at 10:55
  • - It will solve the” from now on “ created tables. NOT for EXIST tables. For them you need to do : ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; Source - https://www.digitalocean.com/community/questions/any-way-to-disable-charset-handshake-and-change-default-server-charset-in-managed-mysql – lingar Jun 17 '20 at 11:34
  • this keeps on changing, utf8mb4_0900_ai_ci is the way to go (as of 07/26/21) – YesItsMe Jul 26 '21 at 04:47
  • Add mine to the voices that complain the instruction to use the collation utf8mb4_unicode_ci is not generally good advice. That is no longer the default collation for utf8mb4 in modern versions of MySQL, and someone should have a specific reason for switching to it. In fact, the linked answer about comparing _general_ci and _unicode_ci has been updated to recommend _not_ to use those collations any more. This answer should be updated as well, indicating that you should almost always use utf8mb4, _and whatever the current default collation is for it._ – GrandOpener Jul 19 '23 at 20:00
48

To make this 'permanent', in my.cnf:

[client]
default-character-set=utf8
[mysqld]
character-set-server = utf8

To check, go to the client and show some variables:

SHOW VARIABLES LIKE 'character_set%';

Verify that they're all utf8, except ..._filesystem, which should be binary and ..._dir, that points somewhere in the MySQL installation.

mahemoff
  • 44,526
  • 36
  • 160
  • 222
Javier
  • 60,510
  • 8
  • 78
  • 126
33

MySQL 4.1 and above has a default character set that it calls utf8 but which is actually only a subset of UTF-8 (allows only three-byte characters and smaller).

Use utf8mb4 as your charset if you want "full" UTF-8.

T.W.R. Cole
  • 4,106
  • 1
  • 19
  • 26
  • 5
    Definitely agree, this is the only correct answer. `utf8` doesn't include chars like emoticons. `utf8mb4` does. Check this for more info on how to update : https://mathiasbynens.be/notes/mysql-utf8mb4 – jibai31 Aug 27 '15 at 09:37
  • @Basti -- Mostly correct (latin1 was the default until just recently), and not complete (does not discuss correctly inserting/selecting utf8-encoded data, nor displaying in html). – Rick James Jan 20 '16 at 03:28
  • Respectfully, @RickJames, Basti said "so far" - I don't remember seeing your answer when I posted this. – T.W.R. Cole Mar 17 '16 at 15:23
  • Alas, there are about 5 distinctly different symptoms of utf8 problems, and about 4 things that programmers do wrong to cause trouble. Most answers point out only _one_ thing that may need fixing. The original question was broad one, so the answer needed all 4. Perhaps Basti was familiar with _one_ symptom for which your one aspect was the solution. – Rick James Mar 17 '16 at 15:58
  • Or, to look at another way, "handle UTF-8 characters correctly" can be read two ways... You read it as needing utf8mb4. I read it as not garbling the text on the way in/out of the database. By your, and Basti's, interpretation, your answer is correct and complete. – Rick James Mar 17 '16 at 16:01
  • 12
    As an aside, I'd like to pause a moment and give the MySQL team a really good, hard stare. o_o WTF were you guys thinking? Do you realize how much confusion you've sown by creating a codepage in your program called "utf8" that isn't actually UTF-8? Goddamn assholes. – T.W.R. Cole Mar 17 '16 at 18:21
23

The short answer: Use utf8mb4 in 4 places:

  • The bytes in your client are utf8, not latin1/cp1251/etc.
  • SET NAMES utf8mb4 or something equivalent when establishing the client's connection to MySQL
  • CHARACTER SET utf8mb4 on all tables/columns -- except columns that are strictly ascii/hex/country_code/zip_code/etc.
  • <meta charset charset=UTF-8> if you are outputting to HTML. (Yes the spelling is different here.)

More info ;
UTF8 all the way

The above links provide the "detailed canonical answer is required to address all the concerns". -- There is a space limit on this forum.

Edit

In addition to CHARACTER SET utf8mb4 containing "all" the world's characters, COLLATION utf8mb4_unicode_520_ci is arguable the 'best all-around' collation to use. (There are also Turkish, Spanish, etc, collations for those who want the nuances in those languages.)

Community
  • 1
  • 1
Rick James
  • 135,179
  • 13
  • 127
  • 222
  • My [_new link_](http://stackoverflow.com/questions/38363566/trouble-with-utf8-characters-what-i-see-is-not-what-i-stored) on how to debug utf8 problems from the output you get. – Rick James Aug 11 '16 at 18:15
  • Why unicode_520_ci is not the best all around: https://stackoverflow.com/a/49982378/62202 – lsl Aug 02 '18 at 02:17
  • @Louis - And as I implied Spanish and Turkish (as well as Polish) users may not happy. "Best all-around" tends to hurt everyone some. MySQL 8.0 has an even newer "best" collation: [_utf8mb4_0900_ai_ci_](http://mysql.rjweb.org/utf8mb4_collations.html). Alas, again L=Ł. – Rick James Aug 16 '18 at 07:36
5

The charset is a property of the database (default) and the table. You can have a look (MySQL commands):

show create database foo; 
> CREATE DATABASE  `foo`.`foo` /*!40100 DEFAULT CHARACTER SET latin1 */

show create table foo.bar;
> lots of stuff ending with
> ) ENGINE=InnoDB AUTO_INCREMENT=252 DEFAULT CHARSET=latin1

In other words; it's quite easy to check your database charset or change it:

ALTER TABLE `foo`.`bar` CHARACTER SET utf8mb4; /* was: utf8 */
dland
  • 4,319
  • 6
  • 36
  • 60
extraneon
  • 23,575
  • 2
  • 47
  • 51
4

I followed Javier's solution, but I added some different lines in my.cnf:

[myslqd]
skip-character-set-client-handshake
collation_server=utf8_unicode_ci
character_set_server=utf8 

I found this idea here: http://dev.mysql.com/doc/refman/5.0/en/charset-server.html in the first/only user comment on the bottom of the page. He mentions that skip-character-set-client-handshake has some importance.

Vlad Balan
  • 444
  • 5
  • 9
  • 2
    This unloved, zero-vote answer was the only thing that helped me! So it gets my vote, that's for darn sure. `skip-character-set-client-handshake` was the key. – Marcus Feb 20 '19 at 07:39
3

To change the character set encoding to UTF-8 for the database itself, type the following command at the mysql> prompt. USE ALTER DATABASE.. Replace DBNAME with the database name:

ALTER DATABASE DBNAME CHARACTER SET utf8 COLLATE utf8_general_ci;

This is a duplicate of this question How to convert an entire MySQL database characterset and collation to UTF-8?

Community
  • 1
  • 1
Nyein Aung
  • 63
  • 1
  • 6
0

Set your database collation to UTF-8 then apply table collation to database default.

Gaurav Lad
  • 1,788
  • 1
  • 16
  • 30
0

Use the collate utf8mb4 on mysql, add the attribute mysql_enable_utf8mb4 on DBI connection and do the sql command "SET NAMES utf8mb4" after connection to the mysql will make perl handle UTF-8 correctly.

#!/usr/bin/perl
print "Content-type: text/html; charset=UTF-8\n\n";

#use utf8;
#use open ':utf8';
#binmode STDOUT, ":utf8";
#binmode STDIN , ":utf8";
#use encoding 'utf8';

our $dbh = DBI->connect("DBI:mysql:database=$database;host=$servername;port=$port",$username,$password, {PrintWarn => 0, PrintError => 0, mysql_enable_utf8mb4 => 1}) || die;
$dbh->do("SET NAMES utf8mb4");
lynx_74
  • 1,633
  • 18
  • 12
-1

Your answer is you can configure by MySql Settings. In My Answer may be something gone out of context but this is also know is help for you.
how to configure Character Set and Collation.

For applications that store data using the default MySQL character set and collation (latin1, latin1_swedish_ci), no special configuration should be needed. If applications require data storage using a different character set or collation, you can configure character set information several ways:

  • Specify character settings per database. For example, applications that use one database might require utf8, whereas applications that use another database might require sjis.
  • Specify character settings at server startup. This causes the server to use the given settings for all applications that do not make other arrangements.
  • Specify character settings at configuration time, if you build MySQL from source. This causes the server to use the given settings for all applications, without having to specify them at server startup.

The examples shown here for your question to set utf8 character set , here also set collation for more helpful(utf8_general_ci collation`).

Specify character settings per database

  CREATE DATABASE new_db
  DEFAULT CHARACTER SET utf8
  DEFAULT COLLATE utf8_general_ci;

Specify character settings at server startup

[mysqld]
character-set-server=utf8
collation-server=utf8_general_ci

Specify character settings at MySQL configuration time

shell> cmake . -DDEFAULT_CHARSET=utf8 \
           -DDEFAULT_COLLATION=utf8_general_ci

To see the values of the character set and collation system variables that apply to your connection, use these statements:

SHOW VARIABLES LIKE 'character_set%';
SHOW VARIABLES LIKE 'collation%';

This May be lengthy answer but there is all way, you can use. Hopeful my answer is helpful for you. for more information http://dev.mysql.com/doc/refman/5.7/en/charset-applications.html

Vipin Jain
  • 3,686
  • 16
  • 35
-2

DATABASE CONNECTION TO UTF-8

$connect = mysql_connect('$localhost','$username','$password') or die(mysql_error());
mysql_set_charset('utf8',$connect);
mysql_select_db('$database_name','$connect') or die(mysql_error());
Panda
  • 6,955
  • 6
  • 40
  • 55
-2

This worked for me:

mysqli_query($connection, "SET NAMES 'utf8'");

  • A number of other answers have already suggested this solution. Please ensure you are providing a distinct contribution when answering old questions. – snakecharmerb Aug 15 '21 at 11:53
-3

Set your database connection to UTF8:

  if($handle = @mysql_connect(DB_HOST, DB_USER, DB_PASS)){          
         //set to utf8 encoding
         mysql_set_charset('utf8',$handle);
  }
halfer
  • 19,824
  • 17
  • 99
  • 186
fin
  • 11
  • 1
-3

SET NAMES UTF8

This is does the trick

mak
  • 13,267
  • 5
  • 41
  • 47
Claudio
  • 5,740
  • 5
  • 33
  • 40
  • 3
    While using `SET NAMES UTF8` (or `UTF8mb4`) is correct, you don't explain what it does (character set used for this connection). "This does the trick" sounds like it would solve the problem (make MySQL handle UTF-8 properly), but many MySQL databases are set to latin1 by default, so that wouldn't make it a proper solution. I would [change the default charset](http://stackoverflow.com/a/24487309/664132) and the table charsets to utf8mb4. Really, this answer is rather incomplete, so I downvoted it. – basic6 Jun 30 '14 at 09:53
-3

Was able to find a solution. Ran the following as specified at http://technoguider.com/2015/05/utf8-set-up-in-mysql/

SET NAMES UTF8;
set collation_server = utf8_general_ci;
set default-character-set = utf8;
set init_connect = ’SET NAMES utf8′;
set character_set_server = utf8;
set character_set_client = utf8;
Rory McCrossan
  • 331,213
  • 40
  • 305
  • 339
  • The last two lines are redundant, since the first one already includes those: https://dev.mysql.com/doc/refman/5.0/en/charset-connection.html – DanielM Jul 25 '15 at 09:54
  • Also not a complete solution. The columns need `CHARACTER SET utf8`. `root` will not execute the all-important `init_connect`. – Rick James Jan 20 '16 at 03:30