95

I am trying to store a tweet in my MYSQL table. The tweet is:

quiero que me escuches, no te burles no te rias, anoche tuve un sueño que te fuiste de mi vida 🎢🎢

The final two characters are both 'MULTIPLE MUSICAL NOTES' (U+1F3B6), for which the UTF-8 encoding is 0xf09f8eb6.

The tweet_text field in my table is encoded in utf8mb4. But when I try to store the tweet in that column I get the following error message:

Incorrect string value: '\xF0\x9F\x8E\xB6\xF0\x9F...' for column 'tweet_text' at row 1.

What is going wrong? How can I fix this? I need to store multiple languages as well and this character set works for all languages but not for the special characters like emoticons and emojis.

This is my create table statement:

CREATE TABLE `twitter_status_data` (
  `unique_status_id` bigint(20) NOT NULL AUTO_INCREMENT,
  `metadata_result_type` text CHARACTER SET utf8,
  `created_at` text CHARACTER SET utf8 NOT NULL COMMENT 'UTC time when this Tweet was    created.',
  `id` bigint(20) unsigned NOT NULL COMMENT 'Unique tweet identifier',
  `id_str` text CHARACTER SET utf8 NOT NULL,
  `tweet_text` text COMMENT 'Actual UTF-8 text',
  `user_id_str` text CHARACTER SET utf8,
  `user_name` text COMMENT 'User''s name',
  `user_screen_name` text COMMENT 'Twitter handle',
  `coordinates` text CHARACTER SET utf8,
  PRIMARY KEY (`unique_status_id`),
  KEY `user_id_index` (`user_id`),
  FULLTEXT KEY `tweet_text_index` (`tweet_text`)
) ENGINE=InnoDB AUTO_INCREMENT=82451 DEFAULT CHARSET=utf8mb4;
db1
  • 2,939
  • 3
  • 15
  • 13
  • 10
    Have you set the character set of your database connection to `utf8mb4`? – eggyal Dec 05 '13 at 21:52
  • Sorry, What do you mean by character set of database connection? The character set of the database is utf8mb4 and collation is utf8mb4_general_ci – db1 Dec 05 '13 at 22:33
  • I get a similar error message for this tweet as well https://twitter.com/OfficialMansi/status/406007380241969152 The tweet has a smiley at the end. The error message is Incorrect string value: '\xF3\xBE\x8D\x83 h...' for column 'tweet_text' at row 1 – db1 Dec 05 '13 at 22:38
  • http://dev.mysql.com/doc/en/charset-connection.html – eggyal Dec 05 '13 at 23:49
  • 1
    If nothing else works, do you mind just posting the 'create table' statement for this table? – Danack Dec 06 '13 at 02:32
  • I added the create table statement to the original question above. I wonder why it does not show the character set of tweet_text field as utf8mb4. It shows that on the mysql client that i am using – db1 Dec 06 '13 at 14:19
  • The database *connection*. For the old mysql API: `mysql_set_charset('utf8mb4')`. For newer APIs it depends on the API. – deceze Dec 06 '13 at 14:23
  • possible duplicate of [Using utf8mb4 with php and mysql](http://stackoverflow.com/questions/16893035/using-utf8mb4-with-php-and-mysql) – deceze Dec 06 '13 at 14:24
  • When i run this query in mysql, it shows me the character set for the tweet_text field as utf8mb4 SELECT character_set_name FROM information_schema.`COLUMNS` C WHERE table_schema = "dreams_twitter" AND table_name = "twitter_status_data" AND column_name = "tweet_text"; – db1 Dec 06 '13 at 14:26
  • 3
    Not the charset of the column, the ***connection charset!*** When you connect to the database, there's a *connection charset* which tells the database what encoding you'll be sending data in and what encoding you expect back. How are you connecting to the database exactly? – deceze Dec 06 '13 at 14:38
  • I use this statement to set the connection charset to utf8mb4 after i connect to the database: mysql_set_charset($con,'utf8mb4'); – db1 Dec 06 '13 at 14:43
  • I use this statement to set the connection charset to utf8mb4 after i connect to the database: mysqli_set_charset($con,'utf8mb4'); – db1 Dec 06 '13 at 14:52
  • @deceze > the *connection charset!* Thank you deceze you are my savior! – kgf3JfUtW Oct 27 '18 at 20:43

9 Answers9

143

I was finally able to figure out the issue. I had to change some settings in mysql configuration my.ini This article helped a lot http://mathiasbynens.be/notes/mysql-utf8mb4#character-sets

First i changed the character set in my.ini to utf8mb4 Next i ran the following commands in mysql client

SET NAMES utf8mb4; 
ALTER DATABASE dreams_twitter CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci;

Use the following command to check that the changes are made

SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
db1
  • 2,939
  • 3
  • 15
  • 13
  • 3
    Side note: If you use *node npm sequelize* + MySQL, you need to set the charset in a special way which seemed to be quirky, at least to me. https://github.com/sequelize/sequelize/issues/395 – treejanitor Jun 15 '16 at 21:38
  • oh my God, :) thanks alot db1. after about 12 hours done with your advice. I've just executed the "set names utf8mb4" at connection creation moment in code then all the things went correctly. – Amin Heydari Alashti Jul 19 '18 at 16:45
  • i executed SET NAMES utf8mb4; query before my insert , but now i am unable to get new Insert Id from prepared statement. please suggest me the solution for this – Onkar Musale May 29 '19 at 06:31
  • I made a script that will convert all tables in a database automagically https://gist.github.com/pobegov/f8b293fb6eb658a13feb1c318e6c07ed – clops Jul 04 '19 at 17:16
  • Just to add the reasoning - In MySQL, UTF-8 can only hold 3 bytes per character. MySQL encoding utf8mb4 is actually the real 4-byte utf8 encoding. – Pankaj Dec 27 '19 at 07:52
22

I had hit the same problem and learnt the following-

Even though database has a default character set of utf-8, it's possible for database columns to have a different character set in MySQL. Modified dB and the problematic column to UTF-8:

mysql> ALTER DATABASE MyDB CHARACTER SET 'utf8' COLLATE 'utf8_unicode_ci'

mysql> ALTER TABLE database.table MODIFY COLUMN column_name VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL;

Now creating new tables with:

> CREATE TABLE My_Table_Name (
    twitter_id_str VARCHAR(255) NOT NULL UNIQUE,
    twitter_screen_name VARCHAR(512) CHARACTER SET utf8 COLLATE utf8_unicode_ci,
    .....
  ) CHARACTER SET utf8 COLLATE utf8_unicode_ci;
Vishal
  • 1,253
  • 1
  • 11
  • 17
18

It may be obvious, but it still was surprising to me, that SET NAMES utf8 is not compatible with utf8mb4 encoding. So for some apps changing table/column encoding was not enough. I had to change encoding in app configuration.

Redmine (ruby, ROR)

In config/database.yml:

production:
  adapter: mysql2
  database: redmine
  host: localhost
  username: redmine
  password: passowrd
  encoding: utf8mb4

Custom Yii application (PHP)

In config/db.php:

return [
    'class' => yii\db\Connection::class,
    'dsn' => 'mysql:host=localhost;dbname=yii',
    'username' => 'yii',
    'password' => 'password',
    'charset' => 'utf8mb4',
],

If you have utf8mb4 as a column/table encoding and still getting errors like this, make sure that you have configured correct charset for DB connection in your application.

rob006
  • 21,383
  • 5
  • 53
  • 74
  • Redmine - not worked correctly when config applied on already existed DB, so here is migration code to fix it https://www.redmine.org/issues/21398 – Ashen One Apr 03 '20 at 12:05
  • Thanks for posting the Yii solution, it worked. – Skatox Jan 17 '22 at 14:06
18

Change database charset and collation

ALTER DATABASE
    database_name
    CHARACTER SET = utf8mb4
    COLLATE = utf8mb4_unicode_ci;

change specific table's charset and collation

ALTER TABLE
    table_name
    CONVERT TO CHARACTER SET utf8mb4
    COLLATE utf8mb4_unicode_ci;

change connection charset in mysql driver

before

charset=utf8&parseTime=True&loc=Local

after

charset=utf8mb4&collation=utf8mb4_unicode_ci&parseTime=True&loc=Local

From this article https://hackernoon.com/today-i-learned-storing-emoji-to-mysql-with-golang-204a093454b7

Giang
  • 2,384
  • 2
  • 25
  • 26
7

According to the create table statement, the default charset of the table is already utf8mb4. It seems that you have a wrong connection charset.

In Java, set the datasource url like this:

jdbc:mysql://127.0.0.1:3306/testdb?useUnicode=true&characterEncoding=utf-8`.

?useUnicode=true&characterEncoding=utf-8 is necessary for using utf8mb4.

It works for my application.

Nakilon
  • 34,866
  • 14
  • 107
  • 142
Shine Huang
  • 159
  • 1
  • 4
  • this works! depending on the client as well (such as some client understands `utf-8` but some just understands `utf8`, in my case using `utf8mb4` for the char set works). – Hopeless Jul 18 '19 at 04:43
  • This worked for me too. Setting the right charset in the server is not enough. – dcdrns Oct 27 '21 at 13:12
1

FOR SQLALCHEMY AND PYTHON

The encoding used for Unicode has traditionally been 'utf8'. However, for MySQL versions 5.5.3 on forward, a new MySQL-specific encoding 'utf8mb4' has been introduced, and as of MySQL 8.0 a warning is emitted by the server if plain utf8 is specified within any server-side directives, replaced with utf8mb3. The rationale for this new encoding is due to the fact that MySQL’s legacy utf-8 encoding only supports codepoints up to three bytes instead of four. Therefore, when communicating with a MySQL database that includes codepoints more than three bytes in size, this new charset is preferred, if supported by both the database as well as the client DBAPI, as in:

e = create_engine(
    "mysql+pymysql://scott:tiger@localhost/test?charset=utf8mb4")
All modern DBAPIs should support the utf8mb4 charset.

enter link description here

Nick Po
  • 128
  • 7
1

I had use an emoji in my string that was the reason for this error.

So make sure you are not using some incorrect string that is not valid to save into the database.

MD SHAYON
  • 7,001
  • 45
  • 38
0

As others said, it's because you are trying to save a 4 bytes of data into less space.

If you are facing the similar issue in java and don't have the flexibility to change the charset and collate encoding of database than this answer is for you.

you can use the Emoji Java library to achieve the same. You can convert into alias before saving/updating into database and convert back to unicode post save/update/load from database. The main benefit is readability of the text even after the encoding because this library only alias the emoji's rather than whole string.

Shivang Agarwal
  • 1,825
  • 1
  • 14
  • 19
0

I changed MySQL settings and still the same. Finally I used the function utf8_decode() on the string before insert.

Reza
  • 1
  • 1