21

I ignorantly used the default latin1_swedish_ci character encoding for all of the varchar rows in my database during development and I've determined that this is the root of the character encoding problems I've been having. In addition to that, it seems like most people these days are recommending that utf8_unicode_ci be used.

I'd like to convert the character encoding for all rows in my database from latin1_swedish_ci to utf8_unicode_ci, but the only way I know how to do is is change it row-by-row in phpMyAdmin, which is really time consuming.

Is there a faster way, such as a query that can be run that changes the collation of all varchar/text rows from latin1_swedish_ci to utf8_unicode_ci?

Nate
  • 26,164
  • 34
  • 130
  • 214

2 Answers2

43

If the columns are using the default table character set then it's just one query per table to convert:

ALTER TABLE t CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;

If the character set is set individually on each column, AFAIK there is no way to do that on all columns of all tables in the database directly in MySql, but you could write a tiny program in your language of choice that does so.

Your program would query the INFORMATION_SCHEMA.COLUMNS table and look at the CHARACTER_SET_NAME column:

SELECT * FROM `INFORMATION_SCHEMA.COLUMNS`
WHERE TABLE_SCHEMA = 'dbname' AND CHARACTER_SET_NAME = 'latin1'

For each result row it's trivial to synthesize and execute an ALTER TABLE query on the spot that changes the character set and collation appropriately:

ALTER TABLE t MODIFY col TEXT CHARACTER SET utf8 COLLATE utf8_unicode_ci;

In the above query t, col and TEXT would be the values of the TABLE_NAME, COLUMN_NAME and DATA_TYPE columns from the INFORMATION_SCHEMA.COLUMNS result set.

Jon
  • 428,835
  • 81
  • 738
  • 806
  • Thank you! it worked great and saved me a huge amount of time! – Nate Aug 29 '13 at 20:57
  • Nice answer. As far as I know, `utf8_unicode_ci` should be used instead of `utf8_general_ci`, as `utf8_unicode_ci` is more accurate. So, usage of `utf8_unicode_ci` should be recommended. http://stackoverflow.com/questions/766809/whats-the-difference-between-utf8-general-ci-and-utf8-unicode-ci – Sudarshan_SMD Jan 27 '16 at 04:49
  • @Sudarshan_SMD you are right, not sure why I used general instead of unicode, especially considering the question says general, and it's been too long now. Thanks for catching it! – Jon Jan 27 '16 at 09:37
10

You can actually do this inside MySQL, using a procedure.

Based on https://stackoverflow.com/a/12718767/1612273. It uses the current database, so make sure you're doing it on the right one!

delimiter //

DROP PROCEDURE IF EXISTS convert_database_to_utf8 //

CREATE PROCEDURE convert_database_to_utf8()
BEGIN
    DECLARE table_name VARCHAR(255);
    DECLARE done INT DEFAULT FALSE;

    DECLARE cur CURSOR FOR
        SELECT t.table_name FROM information_schema.tables t WHERE t.table_schema = DATABASE() AND t.table_type='BASE TABLE';
    DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;

    OPEN cur;
        tables_loop: LOOP
            FETCH cur INTO table_name;

            IF done THEN
                LEAVE tables_loop;
            END IF;

            SET @sql = CONCAT("ALTER TABLE ", table_name, " CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci");
            PREPARE stmt FROM @sql;
            EXECUTE stmt;
            DROP PREPARE stmt;
        END LOOP;
    CLOSE cur;
END //

delimiter ;
call convert_database_to_utf8();
Community
  • 1
  • 1
arnoud
  • 147
  • 1
  • 7