0

Proclaimer: YES, I've done my search on Stackoverflow and NO it couldn't find an answer for this case.

I'm migrating data from an forum which has some legacy in it's MySQL database. One of the issues is the storage of Emoji's.

Donor database:

-- Server: 5.5.41-MariaDB
CREATE TABLE `forumtopicresponse` (
  `id` int(10) UNSIGNED NOT NULL,
  `topicid` int(10) UNSIGNED NOT NULL DEFAULT '0',
  `userid` int(10) UNSIGNED NOT NULL DEFAULT '0',
  `message` text NOT NULL,
  `created` int(10) UNSIGNED NOT NULL DEFAULT '0',
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

In the message column I've got a message like this: Success!ðŸ‘ðŸ‘, which displays as "Success!"

Laravel target database:

-- Server: MySQL 5.7.x
CREATE TABLE `answers` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `topic_id` int(10) unsigned NOT NULL,
  `user_id` int(10) unsigned NOT NULL,
  `body` text CHARACTER SET utf8mb4,
  `created_at` timestamp NULL DEFAULT NULL,
  `updated_at` timestamp NULL DEFAULT NULL,
  ...keys & indexes
) ENGINE=InnoDB AUTO_INCREMENT=1254419 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

In HTML the document has a <meta charset="utf-8"> and to display the field, I'm using

{!! nl2br(e($answer->body)) !!}

And with this it just displays as Success!ðŸ‘👠and not the Emoji's.

Question

How can I migrate this data CLEAN and UTF-8 valid into my fresh database? I think I need some utf encoding, but can't figure out which.

UPDATE! THE SOLUTION

Got it fixed. The only solution was to alter the table in the Donor database.

ALTER TABLE forumtopicresponse CHANGE message message LONGTEXT CHARACTER SET latin1;
ALTER TABLE forumtopicresponse CHANGE message message LONGBLOB;

Do NOT change the LONGBLOB to LONGTEXT anymore: I lost data this way.

When I migrate the LONGBLOB data to the Laravel target database everything get's migrated correctly: all special chars and emoji's are fixed and in UTF-8.

vanderbake
  • 356
  • 2
  • 11

1 Answers1

0

The Emoji is hex F09F918D. That is, it is a 4-byte string.

MySQL's CHARACTER SET = utf8 does not handle 4-byte UTF-8 strings, only 3-byte ones, thereby excluding many of the Emoji and some of Chinese.

When interpreted as latin1, those hex digits are 👠(plus a 4th, but unprintable, character). Showing gibberish like that is called "Mojibake".

So, you have 2 problems:

  • Need to change the storage to utf8mb4 so you can store the Emoji.
  • Need to announce to MySQL that your client is speaking UTF-8, not latin1.

See "Best Practice" in Trouble with UTF-8 characters; what I see is not what I stored

And also see UTF-8 all the way through

Here's my list of fixes, but you must first correctly identify which case you have. Applying the wrong fix makes things worse.

There may be a 3rd mistake -- in moving the data from 5.5 to 5.7. Please provide those details.

Rick James
  • 135,179
  • 13
  • 127
  • 222