9

We've recently gone through the effort of converting all our application's MySQL tables, columns, and .ini settings to utf8 encoding. However, we've found that views and triggers created prior to this change still have a reference to the latin1 character set--i.e. the following queries return records:

SELECT * FROM information_schema.triggers
WHERE trigger_schema=SCHEMA() 
AND    (collation_connection != 'utf8_general_ci' OR character_set_client != 'utf8')
;
SELECT * FROM information_schema.views
WHERE table_schema=SCHEMA()
AND   (collation_connection != 'utf8_general_ci' OR character_set_client != 'utf8')
;

Do I need to be concerned about this?

The MySQL documentation on information_schema.triggers and information_schema.views only says "The session value of the character_set_client system variable when the trigger was created." If that's all the value is storing, then is there any reason to try to fix them? (It doesn't sound like that would matter.) But on the other hand, I have to think the database developers chose to store it in the information_schema tables for some reason.

Production has been on utf8 for a while with the views and triggers still referencing latin1, and we haven't seen any issues (although we don't have a very large non-English user base). I've done some testing with different test strings and haven't seen any character corruption.

Kip
  • 107,154
  • 87
  • 232
  • 265
  • 1
    i amn't an expert in this field, but in my experience, the character set conversion in mysql operations can create wrong characters if tildes are present, for this reason, is better set all the collations and charsets to the same utf-8 or utf-8-spanish-ci if the records are in spanish language. – Jjsg08 Jul 13 '15 at 13:46
  • 1
    I do not know enough about the topic to feel comfortable editing, but when I clicked on this I noticed the title sounds very subjective. Would you agree that "What are the consequences of different character sets for triggers and views" would sound a little more on-topic? The question itself is on-topic, IMHO, but the title doesn't do this justice and again, I'm not sure I'm grasping the right idea. – AdamMc331 Jul 13 '15 at 18:09
  • 1
    @McAdam331 good point, I've changed the title. thanks for the suggestion! – Kip Jul 13 '15 at 18:17
  • @Kip no problem, didn't want to take the matter in to my own hands on this one, since you are handing out a lot of rep for the answer haha. – AdamMc331 Jul 13 '15 at 18:18

3 Answers3

5

See the bolded text in the quote below. If you used non-ASCII characters in your triggers/views, e.g. for the comparison with one of your UTF-8 columns, you're better off recreating them. If not it shouldn't matter since those variables are used to set the context for your object to be used/recreated later.

Quote from the Changes in MySQL 5.1.21 (2007-08-16)

Bugs Fixed

Incompatible Change: Several issues were identified for stored programs (stored procedures and functions, triggers, and events) and views containing non-ASCII symbols. These issues involved conversion errors due to incomplete character set information when translating these objects to and from stored format, such as:

  • Parsing the original object definition so that it can be stored.

  • Compiling the stored definition into executable form when the object is invoked.

  • Retrieval of object definitions from INFORMATION_SCHEMA tables.

  • Displaying the object definition in SHOW statements. This issue also affected mysqldump, which uses SHOW.

The fix for the problems is to store character set information from the object creation context so that this information is available when the object needs to be used later. The context includes the client character set, the connection character set and collation, and the collation of the database with which the object is associated.

As a result of the patch, several tables have new columns:

In the mysql database, the proc and event tables now have these columns: character_set_client, collation_connection, db_collation, body_utf8.

In INFORMATION_SCHEMA, the VIEWS table now has these columns: CHARACTER_SET_CLIENT, COLLATION_CONNECTION. The ROUTINES, TRIGGERS, and EVENTS tables now have these columns: CHARACTER_SET_CLIENT, COLLATION_CONNECTION, DATABASE_COLLATION.

These columns store the session values of the character_set_client and collation_connection system variables, and the collation of the database with which the object is associated. The values are those in effect at object creation time. (The saved database collation is not the value of the collation_database system variable, which applies to the default database; the database that contains the object is not necessarily the default database.)

aergistal
  • 29,947
  • 5
  • 70
  • 92
  • 1
    So it sounds like if your view or trigger contains a non-ascii character, the system stores those values so it knows how to read the view/trigger SQL, even if character set has changed since then. – Kip Jul 17 '15 at 16:01
  • 1
    I gave you the bounty because this had a lot of useful information. We did end up finding one issue with this when using mysqldump--see my own answer on this question if you're curious. – Kip Jul 17 '15 at 16:40
1

The different character sets shouldn't pose a problem unless you need to convert between formats. Then you need to make sure that there aren't extra or missing characters. One solution I found to this is to convert all data in your table to BLOB s and then change the character set and convert back to TEXT. https://www.percona.com/blog/2013/10/16/utf8-data-on-latin1-tables-converting-to-utf8-without-downtime-or-double-encoding/ has an explanation of one way to convert from latin1 to utf8, if you need more details.

swapneils
  • 149
  • 10
1

I am not sure how much it will help you but this is something we got today( yes TODAY!! ). But we are also in the process of upgrading our databases to support utf8 characters. One of the temp tables we created on the flow did not have the proper error collation compared to the new utf8 format we used and we came across a new error

Illegal mix of collations (utf8mb4_unicode_ci,IMPLICIT) and (utf8mb4_general_ci,IMPLICIT) for operation '=', 

Luckily we had : Troubleshooting "Illegal mix of collations" error in mysql

I think these kind of issue will happen if you are comparing two string and their collations are not proper. An extensive test with all possible inputs need to be done.

Also, I would like to hear more on your experience while you moved from Latin to UTF8. Both of us would have faced similar issues :-)

Community
  • 1
  • 1
georgecj11
  • 1,600
  • 15
  • 22