0

I am using flourishlib for a website. My client requested that we should be able to use emojis with mobile phones. In theory we should change the character-encoding from utf8 to utf8mb4 for the MySQL database.

So far, so good, however, if we make this switch, like this:

# For each database:
ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE utf8mb4_unicode_ci;
# For each table:
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
# For each column:
ALTER TABLE table_name CHANGE column_name column_name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
# (Don’t blindly copy-paste this! The exact statement depends on the column type, maximum length, and other properties. The above line is just an example for a `VARCHAR` column.)

Then each character will use four bytes instead of three bytes. This would increase the database's size with 33%. This would result in worse performance and more storage space used up. So, as a result, we have decided to switch to an encoding of utf8mb4 for only specific columns of specific tables.

To make sure everything is all right, I have checked several things. Among them, I have checked flourishlib and found a few suspect parts:

  1. There is an fUTF8 class, which does not seem to support utf8mb4

  2. At fDatabase I am quoting some findings:

    if ($this->connection && function_exists('mysql_set_charset') && !mysql_set_charset('utf8', $this->connection)) {
        throw new fConnectivityException(
            'There was an error setting the database connection to use UTF-8'
        );
    }
    //...
    // Make MySQL act more strict and use UTF-8
    if ($this->type == 'mysql') {
        $this->execute("SET SQL_MODE = 'REAL_AS_FLOAT,PIPES_AS_CONCAT,ANSI_QUOTES,IGNORE_SPACE'");
        $this->execute("SET NAMES 'utf8'");
        $this->execute("SET CHARACTER SET utf8");
    }
    
  3. At fSQLSchemaTranslation I can see this:

    $sql = preg_replace('#\)\s*;?\s*$#D', ')ENGINE=InnoDB, CHARACTER SET utf8', $sql);
    

I have the suspicion that flourishlib will not support our quest of making a few columns of a few table have a character encoding of utf8mb4. I wonder whether we can upgrade something somehow to make this support. As a worst-case scenario, we can override every textual occurrence of utf8 to utf8mb4. However, that would be a very ugly hack and we wonder whether there is a better solution. Should we make this hack or is there a more orthodox approach?

Lajos Arpad
  • 64,414
  • 37
  • 100
  • 175

2 Answers2

1

I have resolved the issue. I have altered the tables where I wanted to support emojis by changing the column character set and collation, like this:

ALTER TABLE table_name CHANGE column_name column_name text CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

After that, I had to make a few ugly hacks to make flourishlib able to support emojis.

fDatabase.php:

line 685:

        if ($this->connection && function_exists('mysql_set_charset') && !mysql_set_charset('utf8mb4', $this->connection)) {
            throw new fConnectivityException(
                'There was an error setting the database connection to use UTF-8'
            );
        }

line 717 stays the same, everything crashes if this line is changed:

if ($this->connection && function_exists('mysqli_set_charset') && !mysqli_set_charset($this->connection, 'utf8')) {

line 800:

    // Make MySQL act more strict and use UTF-8
    if ($this->type == 'mysql') {
        $this->execute("SET SQL_MODE = 'REAL_AS_FLOAT,PIPES_AS_CONCAT,ANSI_QUOTES,IGNORE_SPACE'");
        $this->execute("SET NAMES 'utf8mb4'");
        $this->execute("SET CHARACTER SET utf8mb4");
    }

fSQLSchemaTranslation.php:

line 1554:

$sql = preg_replace('#\)\s*;?\s*$#D', ')ENGINE=InnoDB, CHARACTER SET utf8mb4', $sql);

fXML.php:

line 403:

    if (preg_replace('#[^a-z0-9]#', '', strtolower($encoding)) == 'utf8mb4') {
        // Remove the UTF-8 BOM if present
        $xml = preg_replace("#^\xEF\xBB\xBF#", '', $xml);
        fCore::startErrorCapture(E_NOTICE);
        $cleaned = self::iconv('UTF-8', 'UTF-8', $xml);
        if ($cleaned != $xml) {
            $xml = self::iconv('Windows-1252', 'UTF-8', $xml);
        }
        fCore::stopErrorCapture();
    }

and finally, when there are modifications for any of the columns affected, I execute this:

App::db()->query("set names 'utf8mb4'");

which, essentially triggers the ->query() execution of an fDatabase object.

Lajos Arpad
  • 64,414
  • 37
  • 100
  • 175
-1

increase the database's size with 33%.

Not true. English letters still take 1 byte each. What you gain with utf8mb4 is the ability to store emoji and some Chinese characters.

You shouldn't need to ALTER ... CHANGE the columns. Except that you probably had a canned VARCHAR(255) which has issues. Don't simply switch to 191, switch to a 'reasonable' number for each column. Or do nothing. The 191 comes only from an INDEX limitation. You are no indexing every column, are you?

fUTF8 class, which does not seem to support

Complain to flourishlib. Or abandon it. (Too many questions in these forums are complaints about inadequate 3rd party packages, not MySQL, itself.)

You might be able to change to utf8mb4 in MySQL and let flourishlib be oblivious to the change. Technically speaking, MySQL's utf8mb4 matches the rest of the world's concept of utf8; MySQL's utf8 is an incomplete implementation.

$this->execute("SET NAMES 'utf8'");

If you can see this code, you can change it.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Rick, the question is tagged with flourishlib. Which means it is flourishlib-specific. Which means that if you have no knowledge about flourishlib, then you are unable to answer. – Lajos Arpad Sep 08 '15 at 07:25
  • It seems that your argument about varchars is valid though. – Lajos Arpad Sep 08 '15 at 07:28
  • "Complain to flourishlib. Or abandon it. (Too many questions in these forums are complaints about inadequate 3rd party packages, not MySQL, itself.)" ummm... The question is tagged flourishlib, so this sentence is completely invalid. Please, read the questions with more attention. I would really like to accept your answers, but if you do not edit them to at least address the questions, then I will be forced to down-vote them. – Lajos Arpad Sep 08 '15 at 07:30
  • Valid points. Ding me if you like. This is the first time I have heard of `flourishlib`, and I see that there are only 17 questions so far. Perhaps my 'answer' will tell someone who knows flourishlib what needs doing for the interface with MySQL. – Rick James Sep 08 '15 at 17:52
  • I have already solved the problem. The solution involved hacks into flourish code. I have to down-vote this answer, your other answer was accepted though. – Lajos Arpad Sep 08 '15 at 19:56