1

I create a table in mysql on macos commandline using the 'utf-8' charset,

mysql>  CREATE TABLE tb_stu (id VARCHAR(20), name VARCHAR(20), sex CHAR(1), birthday DATE) default charset=utf8;
Query OK, 0 rows affected (0.02 sec)

mysql> SHOW TABLES;
+----------------+
| Tables_in_test |
+----------------+
| pet            |
| tb_stu         |
+----------------+
2 rows in set (0.00 sec)

mysql> show create table tb_stu \G
*************************** 1. row ***************************
       Table: tb_stu
Create Table: CREATE TABLE `tb_stu` (
  `id` varchar(20) DEFAULT NULL,
  `name` varchar(20) DEFAULT NULL,
  `sex` char(1) DEFAULT NULL,
  `birthday` date DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

I want to add some values to the 'tb_stu' table, I have a txt file containing Chinese string :

1   小明  男   2015-11-02
2   小红  女   2015-09-01
3   张三  男   2010-02-12
4   李四  女   2009-09-10

and the txt file is 'utf-8' charset too!

➜  ~ file /Users/lee/Desktop/JAVA/Java从入门到精通/第18章--使用JDBC操作数据库/Example_18_02/tb_stu.txt
/Users/lee/Desktop/JAVA/Java从入门到精通/第18章--使用JDBC操作数据库/Example_18_02/tb_stu.txt: UTF-8 Unicode text

so I execute the mysql command line:

mysql> LOAD DATA LOCAL INFILE '/Users/lee/Desktop/JAVA/Java从入门到精通/第18章--使用JDBC操作数据库/Example_18_02/tb_stu.txt' INTO TABLE tb_stu;
Query OK, 4 rows affected, 4 warnings (0.01 sec)
Records: 4  Deleted: 0  Skipped: 0  Warnings: 4

but I get the messy code in mysql :

mysql> select * from tb_stu;
+------+----------------+------+------------+
| id   | name           | sex  | birthday   |
+------+----------------+------+------------+
| 1    | å°æ˜Ž         | ç    | 2015-11-02 |
| 2    | å°çº¢         | å    | 2015-09-01 |
| 3    | 张三         | ç    | 2010-02-12 |
| 4    | æŽå››         | å    | 2009-09-10 |
+------+----------------+------+------------+
4 rows in set (0.00 sec)

it makes me confused, the tabel in mysql and the txt are both 'utf-8' charset, why I get the messy code? thanks a lot!

Princekin
  • 696
  • 9
  • 18
  • That's called Mojibake. Look for that term in https://stackoverflow.com/questions/38363566/trouble-with-utf8-characters-what-i-see-is-not-what-i-stored to find the cause(s). – Rick James Feb 07 '19 at 01:30

1 Answers1

0

You will need to investigate some more to understand your problem. One of the options for example is that your data was written into DB correctly but in your command line it is just displayed incorrectly due to some wrong setting of encoding in your operating system environment. Or the problem might be that the data was garbled (corrupted) when it was written and that means that it is wrongly stored in the DB. So I would suggest to take your original file with properly displayed Chinese characters and convert them to unicode sequence, and then take the data in DB and also convert them into unicode sequence and compare to see if your DB data is just displayed incorrectly or the data itself is corrupted. This will help you to understand your problem and then to find a way to fix it. Here is tool that can help you:

There is an Open Source java library MgntUtils (written by me) that has a Utility that converts Strings to unicode sequence and vise versa:

result = "Hello World";
result = StringUnicodeEncoderDecoder.encodeStringToUnicodeSequence(result);
System.out.println(result);
result = StringUnicodeEncoderDecoder.decodeUnicodeSequenceToString(result);
System.out.println(result);

The output of this code is:

\u0048\u0065\u006c\u006c\u006f\u0020\u0057\u006f\u0072\u006c\u0064
Hello World

The library can be found at Maven Central or at Github It comes as maven artifact and with sources and javadoc

Here is javadoc for the class StringUnicodeEncoderDecoder

Michael Gantman
  • 7,315
  • 2
  • 19
  • 36
  • 1
    Instead of getting Unicode codepoints, check what is in the table via `SELECT col, HEX(col) ...` `小明` is hex `E5B08FE6988E` _if_ correctly stored. In that case, the problem is on the fetching or displaying side. – Rick James Feb 07 '19 at 01:33