0

I'm having a hard time getting the encoding of my mysql-database right. It's not possible to insert some Chinese characters correctly. I'm using xampp on windows and have the following table:

CREATE TABLE `Kanji` (
`ID`    MEDIUMINT NOT NULL PRIMARY KEY AUTO_INCREMENT,
`Kanji` VARCHAR(40) NOT NULL UNIQUE
);

The following insert does not work:

INSERT INTO Kanji (Kanji)
VALUES ('');

When I check the contents of the table in phpMyAdmin, it just displays a '?'. Insertion of other Chinese characters works (e.g. '陸').

Here are the encoding-settings of my database:

character_set_client     | utf8mb4
character_set_connection | utf8mb4
character_set_database   | utf8mb4
character_set_filesystem | binary
character_set_results    | utf8mb4
character_set_server     | utf8mb4
character_set_system     | utf8              
collation_connection     | utf8mb4_general_ci
collation_database       | utf8mb4_general_ci
collation_server         | utf8mb4_general_ci

Any ideas what's wrong?

JoeCool
  • 129
  • 7

1 Answers1

0

This has to do with utf8 not being able to handle 4-byte characters, as explained here:

https://dev.mysql.com/doc/refman/5.7/en/charset-unicode-utf8mb4.html

To solve this problem I had to:

1) Run the three ALTER statements found in the comments section under the answer marked correct: "Incorrect string value" when trying to insert UTF-8 into MySQL via JDBC?

(The ALTER statements are shown below in the test case I created)

2) Set the JDBC charset in MySQL Workbench: From https://dev.mysql.com/doc/refman/5.7/en/charset-connection.html (Also in the test case, below)

----------------------------

Here is the test case I created via MySQL Workbench:

-- I was able to re-produce your case:

create table test.kanji ( col varchar(20) ) ;

INSERT INTO test.kanji (col) VALUES ('');

select * from test.kanji ;
------
? 

---- The solution:

; -- From link 1:
ALTER DATABASE test CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_520_ci;

ALTER TABLE test.kanji CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_520_ci;

ALTER TABLE test.kanji modify col VARCHAR(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_520_ci ;


-- From link 2:
set names 'utf8mb4' ; 


select * from test.kanji ;
---------
? <-- Interesting, this did not clean-up the data we inserted.

;
-- Insert a new row now that we are using utf8mb4 in the connection:
INSERT INTO test.kanji (col) VALUES ('');


select * from test.kanji ;
---------
?  <-- The first insert - still "dirty"
 <-- The second insert is correct.


-- In case you want to re-set everything back to utf8:
;
ALTER DATABASE test CHARACTER SET = utf8 COLLATE = utf8_general_ci;
ALTER TABLE test.kanji CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
ALTER TABLE test.kanji modify col VARCHAR(50) CHARACTER SET utf8 COLLATE utf8_general_ci ;