8

Every single post on SO that I have seen to accomplish this suggests running the following SQL:

ALTER TABLE <tablename> CONVERT TO CHARACTER SET utf8  COLLATE utf8_unicode_ci;

The problem with this, unless I am mistaken, is that it explicitly specifies the column collations, so you end up with something like this when you mysqldump the database:

  `address` varchar(150) COLLATE utf8_unicode_ci DEFAULT NULL,
  `city` varchar(100) COLLATE utf8_unicode_ci DEFAULT NULL,
  `state` varchar(2) COLLATE utf8_unicode_ci DEFAULT NULL,
  `zipcode` varchar(10) COLLATE utf8_unicode_ci DEFAULT NULL,

My question is.. is there no way to convert the column collations to the table or database default without doing this?

For example, I have tables that might look like this:

  `address` varchar(150) DEFAULT NULL,
  `city` varchar(100) DEFAULT NULL,
  `state` varchar(2) COLLATE utf8_general_ci DEFAULT NULL,
  `zipcode` varchar(10) COLLATE utf8_unicode_ci DEFAULT NULL,

What I want, is to convert all columns to utf8_unicode_ci (the table/database default), but not have each column explicitly set to that collation, so that when I mysqldump the converted table, it just looks like this:

  `address` varchar(150) DEFAULT NULL,
  `city` varchar(100) DEFAULT NULL,
  `state` varchar(2) DEFAULT NULL,
  `zipcode` varchar(10) DEFAULT NULL,

with a line at the end of the table creation statement that defines the default character set and collation: ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

bruchowski
  • 5,043
  • 7
  • 30
  • 46

3 Answers3

4

If your table or column is different from the MySQL default, in my case latin1_sweedish_ci, then it will print out the collation with the column. See the following experimentation that demonstrates this.

To set the default character set, see this post.

First, lets create a datbase with two tables. One table has the character set and collation specified.

mysql> create database SO;
mysql> use SO;
mysql> create table test1 (col1 text, col2 text);
mysql> create table test2 (col1 text, col2 text) character set utf8 collate utf8_unicode_ci;

Now check the show create table to see what it looks like:

mysql> show create table test1;
    +-------+-----------------+
    | Table | Create Table
    +-------+-----------------+
    | test1 | CREATE TABLE `test1` (
      `col1` text,
      `col2` text
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1
    +-------+-----------------+
    1 row in set (0.00 sec)

mysql> show create table test2;
    +-------+-----------------+
    | Table | Create Table
    +-------+-----------------+
    | test2 | CREATE TABLE `test2` (
      `col1` text COLLATE utf8_unicode_ci,
      `col2` text COLLATE utf8_unicode_ci
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
    +-------+-----------------+
    1 row in set (0.00 sec)

We see that test2 already looks like the columns are specified specifically rather than using the default. I suspect if it's different from the MySQL default it will list it rather than if it's different from the table default. Let's now see how they look in the information_schema database.

mysql> select table_schema, table_name, table_collation from information_schema.tables where table_schema = 'SO';
    +--------------+------------+-------------------+
    | table_schema | table_name | table_collation   |
    +--------------+------------+-------------------+
    | SO           | test1      | latin1_swedish_ci |
    | SO           | test2      | utf8_unicode_ci   |
    +--------------+------------+-------------------+
    2 rows in set (0.00 sec)

mysql> select table_schema, table_name, column_name, character_set_name, collation_name from information_schema.columns where table_schema = 'SO';
    +--------------+------------+-------------+--------------------+-------------------+
    | table_schema | table_name | column_name | character_set_name | collation_name    |
    +--------------+------------+-------------+--------------------+-------------------+
    | SO           | test1      | col1        | latin1             | latin1_swedish_ci |
    | SO           | test1      | col2        | latin1             | latin1_swedish_ci |
    | SO           | test2      | col1        | utf8               | utf8_unicode_ci   |
    | SO           | test2      | col2        | utf8               | utf8_unicode_ci   |
    +--------------+------------+-------------+--------------------+-------------------+
    4 rows in set (0.00 sec)

It looks like the columns have a specific character set and collation regardless of if we specified it. Lets update test1 to the prefered character set and collation and see what happens.

mysql> ALTER TABLE test1 CONVERT TO CHARACTER SET utf8  COLLATE utf8_unicode_ci;
    Query OK, 0 rows affected (0.05 sec)
    Records: 0  Duplicates: 0  Warnings: 0

mysql> show create table test1;
    +-------+-----------------+
    | Table | Create Table
    +-------+-----------------+
    | test1 | CREATE TABLE `test1` (
      `col1` mediumtext COLLATE utf8_unicode_ci,
      `col2` mediumtext COLLATE utf8_unicode_ci
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
    +-------+-----------------+
    1 row in set (0.00 sec)

mysql> show create table test2;
    +-------+-----------------+
    | Table | Create Table
    +-------+-----------------+
    | test2 | CREATE TABLE `test2` (
      `col1` text COLLATE utf8_unicode_ci,
      `col2` text COLLATE utf8_unicode_ci
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
    +-------+-----------------+
    1 row in set (0.00 sec)

Now they're both putting the collation in the show create table statement. Let's check the information_schema again.

mysql> select table_schema, table_name, table_collation from information_schema.tables where table_schema = 'SO';
    +--------------+------------+-----------------+
    | table_schema | table_name | table_collation |
    +--------------+------------+-----------------+
    | SO           | test1      | utf8_unicode_ci |
    | SO           | test2      | utf8_unicode_ci |
    +--------------+------------+-----------------+
    2 rows in set (0.00 sec)

mysql> select table_schema, table_name, column_name, character_set_name, collation_name from information_schema.columns where table_schema = 'SO';
    +--------------+------------+-------------+--------------------+-----------------+
    | table_schema | table_name | column_name | character_set_name | collation_name  |
    +--------------+------------+-------------+--------------------+-----------------+
    | SO           | test1      | col1        | utf8               | utf8_unicode_ci |
    | SO           | test1      | col2        | utf8               | utf8_unicode_ci |
    | SO           | test2      | col1        | utf8               | utf8_unicode_ci |
    | SO           | test2      | col2        | utf8               | utf8_unicode_ci |
    +--------------+------------+-------------+--------------------+-----------------+
    4 rows in set (0.00 sec)

Looks to be all about the same. But what happens when we add an extra column to both tables?

mysql> alter table test1 add column col3 text;
    Query OK, 0 rows affected (0.05 sec)
    Records: 0  Duplicates: 0  Warnings: 0

mysql> alter table test2 add column col3 text;
    Query OK, 0 rows affected (0.06 sec)
    Records: 0  Duplicates: 0  Warnings: 0

mysql> show create table test1;
    +-------+-----------------+
    | Table | Create Table
    +-------+-----------------+
    | test1 | CREATE TABLE `test1` (
      `col1` mediumtext COLLATE utf8_unicode_ci,
      `col2` mediumtext COLLATE utf8_unicode_ci,
      `col3` text COLLATE utf8_unicode_ci
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
    +-------+-----------------+
    1 row in set (0.00 sec)

mysql> show create table test2;
    +-------+-----------------+
    | Table | Create Table
    +-------+-----------------+
    | test2 | CREATE TABLE `test2` (
      `col1` text COLLATE utf8_unicode_ci,
      `col2` text COLLATE utf8_unicode_ci,
      `col3` text COLLATE utf8_unicode_ci
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
    +-------+-----------------+
    1 row in set (0.00 sec)

In both cases, they picked up the collation from the table. So there shouldn't be much worry about a column added later being out of whack. Let's check the information_schema one more time...

mysql> select table_schema, table_name, table_collation from information_schema.tables where table_schema = 'SO';
    +--------------+------------+-----------------+
    | table_schema | table_name | table_collation |
    +--------------+------------+-----------------+
    | SO           | test1      | utf8_unicode_ci |
    | SO           | test2      | utf8_unicode_ci |
    +--------------+------------+-----------------+
    2 rows in set (0.00 sec)

mysql> select table_schema, table_name, column_name, character_set_name, collation_name from information_schema.columns where table_schema = 'SO';
    +--------------+------------+-------------+--------------------+-----------------+
    | table_schema | table_name | column_name | character_set_name | collation_name  |
    +--------------+------------+-------------+--------------------+-----------------+
    | SO           | test1      | col1        | utf8               | utf8_unicode_ci |
    | SO           | test1      | col2        | utf8               | utf8_unicode_ci |
    | SO           | test1      | col3        | utf8               | utf8_unicode_ci |
    | SO           | test2      | col1        | utf8               | utf8_unicode_ci |
    | SO           | test2      | col2        | utf8               | utf8_unicode_ci |
    | SO           | test2      | col3        | utf8               | utf8_unicode_ci |
    +--------------+------------+-------------+--------------------+-----------------+
    6 rows in set (0.00 sec)

Yeah. All looks like it's working the same way. But what about that hypothesis about it only displaying if it is different from the MySQL default as opposed to the table default? Let's set test1 back to what it used to be.

mysql> ALTER TABLE test1 CONVERT TO CHARACTER SET latin1  COLLATE latin1_swedish_ci;
    Query OK, 0 rows affected (0.02 sec)
    Records: 0  Duplicates: 0  Warnings: 0

mysql> show create table test1;
    +-------+-----------------+
    | Table | Create Table
    +-------+-----------------+
    | test1 | CREATE TABLE `test1` (
      `col1` mediumtext,
      `col2` mediumtext,
      `col3` text
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1
    +-------+-----------------+
    1 row in set (0.00 sec)

Seems to look just like when we started. Now to deomonstrate that it is the MySQL default and not just the database default, let's set the default for the database.

mysql> Alter database SO default character set utf8 collate utf8_unicode_ci;
    Query OK, 1 row affected (0.00 sec)

mysql> show create table test1;
    +-------+-----------------+
    | Table | Create Table
    +-------+-----------------+
    | test1 | CREATE TABLE `test1` (
      `col1` mediumtext,
      `col2` mediumtext,
      `col3` text
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1
    +-------+-----------------+
    1 row in set (0.00 sec)

mysql> show create table test2;
    +-------+-----------------+
    | Table | Create Table
    +-------+-----------------+
    | test2 | CREATE TABLE `test2` (
      `col1` text COLLATE utf8_unicode_ci,
      `col2` text COLLATE utf8_unicode_ci,
      `col3` text COLLATE utf8_unicode_ci
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
    +-------+-----------------+
    1 row in set (0.00 sec)

As you can see, test1 is still looking like when we first started and the show create table is not affected by the database default.

Community
  • 1
  • 1
Cohan
  • 4,384
  • 2
  • 22
  • 40
0

I have just done some testing and it appears to me that the COLLATE is only included (in the mysqldump file) if it is different from the table COLLATE setting. So if you change the column and the table to be the same then it will not be output in the mysqldump file.

With different settings

CREATE TABLE `test` (
  `id` mediumint(9) NOT NULL AUTO_INCREMENT,
  `code` varchar(10) CHARACTER SET utf8 NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=latin7;

With the same settings (latin7)

CREATE TABLE `test` (
  `id` mediumint(9) NOT NULL AUTO_INCREMENT,
  `code` varchar(10) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=latin7;

These tests were done on the same tables, just by changing the collation on the column.

Mark B
  • 649
  • 5
  • 11
  • your example shows a difference in character set, not collation; regardless, my tests output an explicit `COLLATE` clause for every field that allows collation, and the table default collation is the same – bruchowski Jun 15 '15 at 23:09
  • Yes that is true, on my system with MySQL 5.6.17 using phpmyadmin I get the option to alter Collation, that is what I did for the testing, however if I use SQL Buddy, I get the option to alter the charset, but neither give me both. Presumably to avoid the possible mismatch between charset and collation. Both phpmyadmin and sqlbuddy export behaviour is the same as mysqldump in that they only put charset in and then only when it is different to the table default. Maybe it is a version or platform difference, I am using Windows versions. – Mark B Jun 15 '15 at 23:51
0

If you redefine the character set to its value but without specifying collation, the collation is cleared, unless there are other collations in effect. You may have to clear charset and collation all the way up to the table:

select version();
+-----------+
| version() |
+-----------+
| 5.6.17    |
+-----------+


-- start with a table with collation on one column, charset on another
SHOW CREATE TABLE mytable;

CREATE TABLE `mytable` (
  `address` varchar(150) COLLATE utf8_unicode_ci DEFAULT NULL,
  `test` varchar(20) CHARACTER SET utf8 DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

-- By changing charset and specifying no collation, this is cleared
ALTER TABLE mytable MODIFY address varchar(150) CHARACTER SET utf8;

CREATE TABLE `mytable` (
  `address` varchar(150) CHARACTER SET utf8 DEFAULT NULL,
  `test` varchar(20) CHARACTER SET utf8 DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

-- By specifying nothing we get no charset but default collation
ALTER TABLE mytable MODIFY address varchar(150);

CREATE TABLE `mytable` (
  `address` varchar(150) COLLATE utf8_unicode_ci DEFAULT NULL,
  `test` varchar(20) CHARACTER SET utf8 DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

-- changing all the fields at once does not help

Success!

-- So we clear the table AND the fields at the same time (a "change" which is no change at all, really)
ALTER TABLE mytable CHARACTER SET utf8, MODIFY address varchar(150), MODIFY test varchar(20);

SHOW CREATE TABLE mytable;

CREATE TABLE `mytable` (
  `address` varchar(150) DEFAULT NULL,
  `test` varchar(20) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8

I should also observe that I have server_character_set = utf8 and server_collate = utf8_unicode_ci in my my.cnf (Linux OpenSuSE 13.2)

LSerni
  • 55,617
  • 10
  • 65
  • 107