2

Can you someone please provide the best way to convert not only a mysql database and all its tables from latin1_swedish_ci to UTF-8, with their contents? I have been researching all over Stackoverflow as well as elsewhere and the suggestions are always different.

  • Some people suggest just using these commands on the tables and databases:

    ALTER DATABASE databasename CHARACTER SET utf8 COLLATE utf8_unicode_ci;
    ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
    
  • Others say that this just changes the database and tables, but not the contents.

  • Some suggest dumping the db, create a new table with the right char set and collation, and importing the old db into that. Does this actually convert the data as well?

    mysqldump --skip-opt --set-charset --skip-set-charset
    
  • Others suggest running iconv against the dumped DB before importing? Is this really needed or would the import into a UTF-8 db do the conversion?

Finally, other suggest altering the database, convert char/blog tables to binary, and the converting back.

There are so many different methods that it has become very confusing.

Can someone please provide a concise step-by-step instruction, or point me to one, on how I can go about convert my latin DBs and their content to UTF-8? Even better if there is a script that automates this process against a database.

Thanks in advance.

Channa
  • 742
  • 17
  • 28
John Tangale
  • 325
  • 2
  • 17

3 Answers3

3

The are two different problems which are often conflated:

  1. change the specification of a table or column on how it should store data internally
  2. convert garbled mojibake data to its intended characters

Each text column in MySQL has an associated charset attribute, which specifies what encoding text stored in this column should be stored as internally. This only really influences what characters can be stored in this column and how efficient the data storage is. For example, if you're storing a ton of Japanese text, sjis as an encoding may be a lot more efficient than utf8 and save you a bit of disk space.

The column encoding does not in any way influence in what encoding data is input and output to/from the database. This is a separate setting, the connection encoding, which is established for every individual client every time you connect to the database. MySQL will convert data on the fly between the connection encoding and the column/table charset as needed. You can connect to the database with a utf8 connection, send it Japanese text destined for an sjis column, and MySQL will convert from utf8 to sjis on the fly (and back in reverse on the way out).

Now, if you've screwed up the connection encoding (as happens way too often) and you've inserted text in a different encoding than your connection encoding specified (e.g. your connection encoding was latin1 but you actually sent UTF-8 encoded data), then you're storing garbage in your database and you need to recover that. If that's your issue, see How to convert wrongly encoded data to UTF-8?.

However, if all your data is peachy and all you want to do is tell MySQL to store data in a different encoding from now on, you only need this:

ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;

MySQL will convert the current data from its current charset to the new charset and store future data in the new charset. That's all.

Community
  • 1
  • 1
deceze
  • 510,633
  • 85
  • 743
  • 889
0

Here is an example from the Moodle community:

https://docs.moodle.org/23/en/Converting_your_MySQL_database_to_UTF8

(Scroll down to "Explained".)

The author does first an SQL dump, which is a big SQL file. Then he copies the file. After, he makes coding corrections with sed on the copied file. Finally he imports the copied and corrected SQL dump file back into the database.

I can recommend this because with this single steps it is easy to inspect if they have been done right. If something goes wrong, just go back to the last step and try it another way.

peter_the_oak
  • 3,529
  • 3
  • 23
  • 37
0

Use the MySQL Workbench to handle this. http://dev.mysql.com/doc/workbench/en/index.html

  1. Run the migration wizard to produce a script that will create the database schema.
  2. Edit that script to alter the collation and character set (notepad++ search replace is just fine for this) and the shema name so you don't overwrite the existing database.
  3. Run the script to create the copy under a new name.
  4. Use the migration wizard to bulk transfer the data to the new schema. It will handle all the conversion for you and ensure that your data is still good.
David Soussan
  • 2,698
  • 1
  • 16
  • 19