14

Inserting UTF-8 encoded string into UTF-8 encoded table gives incorrect string value.

PDOException: SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\xF0\x9D\x84\x8E i...' for column 'body_value' at row 1: INSERT INTO

I have a character, in a string that mb_detect_encoding claims is UTF-8 encoded. I try to insert this string into a MySQL table, which is defined as (among other things) DEFAULT CHARSET=utf8

Edit: Drupal always does SET NAMES utf8 with optional COLLATE (atleast when talking to MySQL).

Edit 2: Some more details that appear to be relevant. I grab some text from a PostgreSQL database. I stick it onto an object, use mb_detect_encoding to verify that it's UTF-8, and persist the object to the database, using node_save. So while there is an HTTP request that triggers the import, the data does not come from the browser.

Edit 3: Data is denormalized over two tables:

SELECT character_set_name FROM information_schema.COLUMNS C WHERE table_schema = "[database]" AND table_name IN ("field_data_body", "field_revision_body") AND column_name = "body_value";

>+--------------------+
| character_set_name |
+--------------------+
| utf8               |
| utf8               |
+--------------------+

Edit 4: Is it possible that the character is "to new"? I'm more than a little fuzzy on the relationship between unicode and UTF-8, but this wikipedia article, implies that the character was standardized very recently.

I don't understand how that can fail with "Incorrect string value".

Community
  • 1
  • 1
Letharion
  • 4,067
  • 7
  • 31
  • 42
  • Is the field on that table defined with a UTF-8 charset? – Mike Brant Aug 13 '12 at 15:01
  • possible duplicate of [UTF-8 Database Problem](http://stackoverflow.com/questions/2615611/utf-8-database-problem) – Dan Grossman Aug 13 '12 at 15:03
  • 1
    What does `SELECT character_set_name FROM information_schema.\`COLUMNS\` C WHERE table_schema = "db_name" AND table_name = "table_name" AND column_name = "column_name";` give – Esailija Aug 13 '12 at 15:11
  • MySQL's `utf8` is only the BMP. Its `utf8mb4` corresponds to the outside world's `UTF-8` (and includes 4-byte characters). – Rick James May 30 '19 at 05:38
  • The error is caused by trying to cram 4 bytes into a charset that can't handle it (namely MySQL's "utf8"). – Rick James May 30 '19 at 05:39

4 Answers4

25

(U+1D10E) is a character Unicode found outside the BMP (Basic Multilingual Plane) (above U+FFFF) and thus can't be represented in UTF-8 in 3 bytes. MySQL charset utf8 only accepts UTF-8 characters if they can be represented in 3 bytes. If you need to store this in MySQL, you'll need to use MySQL charset utf8mb4. You'll need MySQL 5.5.3 or later. You can use ALTER TABLE to change the character set without much problem; since it needs more space to store the characters, a couple issues show up that may require you to reduce string size. See http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-upgrading.html .

prosfilaes
  • 1,268
  • 13
  • 17
8

to solve this issue, first you change your database field to utf8m4b charset. For example:

ALTER TABLE `tb_name` CHANGE `field_name` `field_name` VARCHAR(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL; 

then in your db connection, set driver_options for it to utf8mb4. For example, if you use PDO

$db = new PDO('mysql:host=localhost;dbname=testdb;charset=utf8mb4', 'username', 'password');

or in zend framework 1.2

$dbParam = array('host' => 'localhost', 'username' => 'db_user_name',
            'password' => 'password', 'dbname' => 'db_name',
            'driver_options' => array(
                '1002' => "SET NAMES 'utf8mb4'",
                '12'    => 0 //this is not necessary
            )
        );
ytdm
  • 1,069
  • 1
  • 12
  • 15
4

In your PDO connecton, set the charset.

new PDO('mysql:host=localhost;dbname=the_db;charset=utf8mb4', $user, $password);
Rick James
  • 135,179
  • 13
  • 127
  • 222
wesside
  • 5,622
  • 5
  • 30
  • 35
3

I fixed the error: SQLSTATE[HY000]: General error: 1366 Incorrect string value ...... with this method:

I use utf8mb4_unicode_ci for database database Set utf8mb4_unicode_ci for all tables tables

Set longblog datatype for column (not text, longtext.... you need big datatype to store 4 bytes of your content) fields

It is okay now. If you use laravel, continue to edit config/database.php

'charset' => 'utf8mb4',
'collation' => 'utf8mb4_unicode_ci',

laravel

If you use function strtolower, replace it with mb_strtolower Notice: you have to put <meta charset="utf-8"> on your head tag

huy
  • 174
  • 1
  • 4