0

Hi I am trying to insert some Chinese characters in my sql database. For some character I am not getting any issues, but for the 2nd insert 2nd character I am getting error as shown below. Has anyone faced this kind of issue before. am I choosing wrong character set or collation?

Reference https://dev.mysql.com/doc/refman/5.7/en/faqs-cjk.html#faq-cjk-why-cjk-fail-searches

CREATE TABLE testing (test VARCHAR(500) CHARACTER SET utf8 COLLATE utf8_unicode_ci);
INSERT INTO testing VALUES('薛');
INSERT INTO testing VALUES('薛萍'); -- ERROR - MySQL Database Error: Incorrect string value: '\xF0\xA6\x86\xB1\xE8\x90...' for column 'test' at row 1
select * from testing;
Jigar Naik
  • 1,946
  • 5
  • 29
  • 60

2 Answers2

2

According to Unicode character inspector the UTF encoding for 薛萍 is:

薛 = E8 96 9B
 = F0 A6 86 B1
萍 = E8 90 8D

MySQL complaints about this:

\xF0\xA6\x86\xB1\xE8\x90

So everything is apparently correct, save for a little implementation detail about utf8_unicode_ci which, in MySQL, is an incomplete UTF-8 encoding that only accepts up to three byte characters. Thus cannot be stored as utf8_unicode_ci.

You need to switch to some utf8mb4_... encoding.

Álvaro González
  • 142,137
  • 41
  • 261
  • 360
  • still not working, have you tried drop table testing; CREATE TABLE testing (test VARCHAR(500) CHARACTER SET utf8mb4); INSERT INTO testing VALUES('薛萍'); – Jigar Naik Oct 31 '17 at 10:33
  • Are you getting the same exact error message? Have you configured your database connection, in whatever client language or framework you use, for utf8mb4 instead of utf8? – Álvaro González Oct 31 '17 at 11:42
  • No i havent changed anything at database level, i just changed the character set at table level as showed in my script. – Jigar Naik Oct 31 '17 at 12:04
  • No, you don't need to change anything at database level. Just configure whatever client language or framework you use for utf8mb4 instead of utf8. Are you connecting from the command line? Is it a Java program? Something else? – Álvaro González Oct 31 '17 at 12:58
  • I am trying to execute using Toad for SQL Developer. Executing direct insert statements. – Jigar Naik Oct 31 '17 at 13:02
  • I'm not familiar with that software, sorry. In the official command-line client `CREATE TABLE testing (test VARCHAR(500) CHARACTER SET utf8mb4); INSERT INTO testing VALUES('薛萍');` runs flawlessly. – Álvaro González Oct 31 '17 at 13:26
0

In addition to having the column set to utf8mb4, you must also tell MySQL that the client is speaking utf8mb4 (not just utf8). For some clients, this is best done when making the connection, for some there is a secondary command, such as SET NAMES utf8mb4 which should be performed right after connecting.

Debugging Q&A: Trouble with UTF-8 characters; what I see is not what I stored

That seems to be a rather unusual (or new?) Chinese character; even http://unicode.scarfboy.com/?s=f0a686b1 fails to show the graphic.

Rick James
  • 135,179
  • 13
  • 127
  • 222