8

I can't execute this SQL script:

INSERT INTO `mabase`.`new_table` (`idnew_table`, `name`) VALUES ('2', '');

the error is:

ERROR 1366: Incorrect string value: '\xF0\xA0\xBC\xAD' for column 'name' at row 1 SQL Statement: INSERT INTO mabase.new_table (idnew_table, name) VALUES ('2', '')

My database and table is in utf8 charset and utf8_general_ci collation. Also i tried: utf8_unicode_ci, utf8mb4_general_ci, bg5_cinese_ci, gbk_cinese_ci.

I've tried all that in MySql workbench on windows.

is four byte character. I have problems only with them. Tell me Please how can I save four byte character in mysql.

tshepang
  • 12,111
  • 21
  • 91
  • 136
yaroslav prokipchyn
  • 472
  • 2
  • 8
  • 17
  • possible duplicate of [MySQL throws Incorrect string value error](http://stackoverflow.com/questions/8709892/mysql-throws-incorrect-string-value-error) – Danack Jul 16 '13 at 15:45

2 Answers2

11

Your desired character, U+20F2D, resides in the "CJK Unified Ideographs Extension B" block of Unicode's "Supplementary Ideographic Plane" and therefore was not available in any MySQL Unicode character set prior to v5.5; since v5.5, it is available in the utf8mb4, utf16, utf16le and utf32 character sets.

It is not available in MySQL's big5 or gbk character sets.


Why the utf8 encoding does not work

As documented under Unicode Support:

The initial implementation of Unicode support (in MySQL 4.1) included two character sets for storing Unicode data:

  • ucs2, the UCS-2 encoding of the Unicode character set using 16 bits per character.

  • utf8, a UTF-8 encoding of the Unicode character set using one to three bytes per character.

These two character sets support the characters from the Basic Multilingual Plane (BMP) of Unicode Version 3.0. BMP characters have these characteristics:

  • Their code values are between 0 and 65535 (or U+0000 .. U+FFFF).

  • They can be encoded with a fixed 16-bit word, as in ucs2.

  • They can be encoded with 8, 16, or 24 bits, as in utf8.

  • They are sufficient for almost all characters in major languages.

Characters not supported by the aforementioned character sets include supplementary characters that lie outside the BMP. Characters outside the BMP compare as REPLACEMENT CHARACTER and convert to '?' when converted to a Unicode character set.

In MySQL 5.6, Unicode support includes supplementary characters, which requires new character sets that have a broader range and therefore take more space. The following table shows a brief feature comparison of previous and current Unicode support.

╔══════════════════════════════╦══════════════════════════════════════════════╗
║       Before MySQL 5.5MySQL 5.5 and up                ║
╠══════════════════════════════╬══════════════════════════════════════════════╣
║ All Unicode 3.0 characters   ║ All Unicode 5.0 and 6.0 characters           ║
╠══════════════════════════════╬══════════════════════════════════════════════╣
║ No supplementary characters  ║ With supplementary characters                ║
╠══════════════════════════════╬══════════════════════════════════════════════╣
║ ucs2 character set, BMP only ║ No change                                    ║
╠══════════════════════════════╬══════════════════════════════════════════════╣
║ utf8 character set for up to ║ No change                                    ║
║ three bytes, BMP only        ║                                              ║
╠══════════════════════════════╬══════════════════════════════════════════════╣
║                              ║ New utf8mb4 character set for up to four     ║
║                              ║ bytes, BMP or supplemental                   ║
╠══════════════════════════════╬══════════════════════════════════════════════╣
║                              ║ New utf16 character set, BMP or supplemental ║
╠══════════════════════════════╬══════════════════════════════════════════════╣
║                              ║ New utf16le character set, BMP or            ║
║                              ║ supplemental (5.6.1 and up)                  ║
╠══════════════════════════════╬══════════════════════════════════════════════╣
║                              ║ New utf32 character set, BMP or supplemental ║
╚══════════════════════════════╩══════════════════════════════════════════════╝

These changes are upward compatible. If you want to use the new character sets, there are potential incompatibility issues for your applications; see Section 10.1.11, “Upgrading from Previous to Current Unicode Support”. That section also describes how to convert tables from utf8 to the (4-byte) utf8mb4 character set, and what constraints may apply in doing so.

Why the big5 encoding does not work

As documented under What problems should I be aware of when working with the Big5 Chinese character set?:

MySQL supports the Big5 character set which is common in Hong Kong and Taiwan (Republic of China). MySQL's big5 is in reality Microsoft code page 950, which is very similar to the original big5 character set.

[ deletia ]

A feature request for adding HKSCS extensions has been filed. People who need this extension may find the suggested patch for Bug #13577 to be of interest.

Why the gbk encoding does not work

As documented under What CJK character sets are available in MySQL?:

Here, we try to clarify exactly what characters are legitimate in gb2312 or gbk, with reference to the official documents. Please check these references before reporting gb2312 or gbk bugs.

  • For a complete listing of the gb2312 characters, ordered according to the gb2312_chinese_ci collation: gb2312

  • MySQL's gbk is in reality “Microsoft code page 936”. This differs from the official gbk for characters A1A4 (middle dot), A1AA (em dash), A6E0-A6F5, and A8BB-A8C0.

  • For a listing of gbk/Unicode mappings, see http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP936.TXT.

  • For MySQL's listing of gbk characters, see gbk.

eggyal
  • 122,705
  • 18
  • 212
  • 237
  • Hello Thanks for so good answer, but i steel need help. I've tried this script on server 5.5 and 5.7 alpha and it fails: CREATE SCHEMA `new_schema` DEFAULT CHARACTER SET utf8mb4 ; CREATE TABLE `new_schema`.`new_table` ( `idnew_table` INT NOT NULL , `new_tablecol` VARCHAR(45) CHARACTER SET 'utf8mb4' NULL , PRIMARY KEY (`idnew_table`) ) DEFAULT CHARACTER SET = utf8mb4; INSERT INTO `new_schema`.`new_table` (`idnew_table`, `new_tablecol`) VALUES ('1', ''); I've execute it from workbench. – yaroslav prokipchyn Jul 17 '13 at 07:12
  • 1
    @yaroslavprokipchyn: I think you'll find it's Workbench that can't cope with supplementary characters. I believe it uses UTF-8 internally and so sets `utf8` as the [charset of its database connections](http://dev.mysql.com/doc/en/charset-connection.html) (even though it probably can support `utf8mb4`, which suggests this is likely a bug) and this is why MySQLd is complaining that the received string is invalid. You could try `SET NAMES 'utf8mb4';`, but Workbench may not like you trying to override its expected character set. Otherwise `INSERT INTO new_table VALUES (1, _utf8mb4'');`. – eggyal Jul 18 '13 at 06:34
  • 1
    A great answer with rich reference, surely deserves more upvotes. – Danubian Sailor Feb 04 '14 at 13:36
0

These 2 commands will support chinese charachters in your database.

ALTER DATABASE CHARACTER SET 'utf8' COLLATE 'utf8_unicode_ci'

ALTER TABLE convert to character set DEFAULT COLLATE DEFAULT

Short and simple.

hope it helps

Abhishek Goel
  • 18,785
  • 11
  • 87
  • 65