2

I've tried importing the following CSV file with UTF-8 encoding into MySQL, but the Chinese characters are not correctly populating. I also changed the column collate to utf8 - default collation and big5-big5_chinese_ci , but the data is still not populating the correct characters.

Data I'm trying to import: Data I'm trying to import into MySQL

SQL Query I'm using to import data:

LOAD DATA LOCAL INFILE 'C:\\Users\\Ulises.Sotomoyor\\Downloads\\Insights.csv' 
INTO TABLE unpaid_media.insights
FIELDS TERMINATED BY ',' 
ENCLOSED BY '"' 
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES
(@Date, Section, Language, Insights)
SET Date = str_to_date(@Date, '%m/%d/%Y')

MySQL Table Setting/Collation: MySQL Table Setting/Collation

Rick James
  • 135,179
  • 13
  • 127
  • 222
Ulises Sotomayor
  • 159
  • 2
  • 14

2 Answers2

3

You need to specify (at least) two things:

  • The character encoding of the data coming in. -- Do this explicitly by adding CHARACTER SET utf8mb4 (or big5) in the LOAD DATA statement.
  • The CHARACTER SET for the column -- It can be either big5 or utf8mb4. This does not have to match the LOAD.

In general, MySQL users should aim for utf8mb4 throughout, but big5 is OK.

The collation does not matter when inserting unless you have a uniqueness constraint. (It does not seem that you have such.)

If you say the data has one encoding (eg, big5), but the column is in a different encoding (eg, utf8mb4), MySQL will convert as you do the inserting.

You may end up with other troubles. See "Best practice" in http://stackoverflow.com/questions/38363566/trouble-with-utf8-characters-what-i-see-is-not-what-i-stored . It is aimed at utf8, but it can be modified to import big5 and/or display in big5.

If you run into more troubles, please try to get the hex of the strings/columns in question.

Rick James
  • 135,179
  • 13
  • 127
  • 222
2

Query used for the solution:

LOAD DATA LOCAL INFILE 'C:\\Users\\Ulises.Sotomoyor\\Downloads\\Insights.csv'
     INTO TABLE unpaid_media.insights
     CHARACTER SET utf8mb4
     FIELDS TERMINATED BY ',' 
     ENCLOSED BY '"' 
     LINES TERMINATED BY '\r\n'
     IGNORE 1 LINES
     (@Date, Section, Language, Insights)
     SET Date = str_to_date(@Date, '%m/%d/%Y')
Ulises Sotomayor
  • 159
  • 2
  • 14