-1

So my question is duplicate or mundane but i really can use some help here. I want to store emojis and some special characters in specific columns of 3 tables. My application is a workflow engine Processmaker running inside docker. This application talks with mysql server running as another docker container.

To solve the issue i have followed below links and did the changes too but all in vain. The databases are created dynamically.

  1. How to store Emoji Character in MySQL Database
  2. https://dba.stackexchange.com/questions/153720/mysql-unable-to-store-emoji-in-utf8mb4-collation

Please correct me if i am wrong. Even if i want to save emoji into 3 tables i have to change the collation and charset of :

  1. The Database.
  2. Associated tables in which i want to store emojis.
  3. Associated columns of those tables which would hold those emoji characters.

I achieve above steps (1-3) by running mysql queries after database creation, like so :

    ALTER DATABASE database CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci; 

    ALTER TABLE table_one CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
    ALTER TABLE table_one CHANGE column_one column_one mediumtext CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; 
    ALTER TABLE table_two CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
    ALTER TABLE table_two CHANGE column_two column_two mediumtext CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
    SET NAMES utf8mb4;

for changing the collation and charset i also tried the approach of configuring it in my.cnf as such ( to achieve step 1 - step 2 ) :

    [mysqld]
    character-set-server = utf8mb4
    collation-server = utf8mb4_unicode_ci
    init_connect='SET NAMES utf8mb4'
    character-set-client-handshake = FALSE

    [mysql]
    default-character-set = utf8mb4


    [client]
    default-character-set = utf8mb4

All of above steps doesn't seem to work. Also the mysql server can have lot of databases which are created dynamically. Right after the database creation i run below query :

SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';

and i get this result : some collation and charset info

My sql client also uses utf8mb4 charset and collation as utf8mb4_unicode_ci. Since there could be hundreds of databases and it isn't possible for me to go and run these queries manually for each of them

So i was thinking if i can run some queries right after database creation OR Is there a way to do it ?

Mysql server version - 5.6.48 running in docker. If you need further info kindly let me know.

Any help would be much appreciated.

Thanks in advance.

Pawan Saxena
  • 521
  • 1
  • 4
  • 14

1 Answers1

1

my.cnf (and other global settings) feed into GLOBAL VARIABLES.

When you connect to the database, defaults are taken from GLOBAL VARIABLES. Any parameters on the connection override them. (Example: Most programming language clients allow you to provide CHARACTER SET and COLLATION during the establishment of the connection.)

An alternative to establishing the values at the time of connecting is to run this right after connecting:

 SET NAMES utf8mb4;

SHOW VARIABLES displays the settings provided when you logged in, plus any session changes made since then.

While you are connected, SET GLOBAL VARIABLES will change the GLOBALs, but not the SESSIONs.

While you are connected, SET SESSION VARIABLES will change the SESSION values, but will be lost at logout.

This my.cnf setting is both good and bad; I recommend avoiding it:

init_command = SET NAMES utf8mb4

It says to run that SET in any session that is starting up. But, 'root' skips this step! (That is probably to allow root to connect without having to fix my.cnf.)

You must make sure the client is encoding things with UTF-8. The details vary widely. Fortunately, clients are moving that direction; but they are not all there yet.

So...

Plan A

  • Set up my.cnf with the 3 char* values and (optionally, the one collation).
  • Don't bother with anything on CREATEs. (Doing so could override my.cnf)
  • But be sure to create the database, then the table. (If you already had a database or table in the system with the 'wrong' charset, it will override my.cnf.)

Plan B (This is overkill, but safe.)

  • Always establish the charset when connecting (via either connection parameters or SET NAMES)
  • Always explicitly specify charset on CREATE DATABASE and CREATE TABLE.

Miscellany

A further note: Stored Routines take on the charset for the database.

CHARACTER SET controls encoding. That is critical for Emojy; utf8mb4 is the only solution.

COLLATION deals with ordering and deciding whether one emoji is "equal to" another. This is rarely an issue for Emoji, but is a big deal for text. For text it controls case folding (is "A"="a") and accent stripping, plus other issues.

If the tables are created with utf8mb4, there is no need for ALTER TABLE. And if there is no explicit DEFAULT CHARACTER SET on the table definition, it will inherit from the CREATE DATABASE, etc.

If something goes wrong, Trouble with UTF-8 characters; what I see is not what I stored may help you figure out where things went wrong.

This is a roadmap on which ALTER is needed to repair a table: http://mysql.rjweb.org/doc.php/charcoll#fixes_for_various_cases

utf8mb4_unicode_ci is a quite old collation. utf8mb4_unicode_520_ci is better. MySQL 8.0 has an even newer one.

My point is that you should not have to do any ALTERs if things are initialized "correctly". Clearly, since "latin1" show in that screen shot, things were not set up "correctly".

The inheritance hierarchy is:

  1. my.cnf and mysqld args (sets GLOBAL VARIABLES)
  2. connect (Copies GLOBAL to SESSION, then overrides with connection parameters) 3.SET NAMES (overrides 3 SESSION VARIABLES)
  3. CREATE DATABASE (Establish inheritable defaults for the database)
  4. CREATE TABLE (Establish inheritable defaults for the table)
  5. Attributes on column definitions.
Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Thanks a ton @rick-james. This information is better than most answers for this problem. Can you help me identify somepoints : for database character-set-server and collation-server can be set using my.cnf Is it possible that i can set character_set_database and collation_database too from my.cnf ? – Pawan Saxena Aug 18 '20 at 22:11
  • 1
    @PawanSaxena - Since you are using Docker, it should be easy enough to experiment. Alas, even after a decade of chasing charset issues, I cannot explain what each of the 7 my.cnf `character_set_%` settings does. 3 of them are dynamically settable by `SET NAMES`. Those 3 seem to be necessary and sufficient seem to handle the issues discussed here. (The other 4 deal with system tables, filesystem, etc.) – Rick James Aug 18 '20 at 22:21
  • Thanks @rick-james. I got it figured out. I also found this additional piece of info. https://dev.mysql.com/doc/refman/8.0/en/charset-server.html – Pawan Saxena Aug 18 '20 at 22:39
  • @PawanSaxena - That page says that the "database" defaults come from the "server" variables, not the "database" variables. Confusing. – Rick James Aug 18 '20 at 22:50
  • For me too it was confusing, but i am reading through next pages now. – Pawan Saxena Aug 18 '20 at 23:19