I have a large CSV file that I am going to load it into a MySQL table. However, these data are encoded into utf-8 format, because they include some non-english characters. I have already set the character set of the corresponding column in the table to utf-8. But when I load my file. the non-english characters turn into weird characters(when I do a select on my table rows). Do I need to encode my data before I load the into the table? if yes how Can I do this. I am using Python to load the data and using LOAD DATA LOCAL INFILE command. thanks
Asked
Active
Viewed 7.1k times
5 Answers
100
Try
LOAD DATA INFILE 'file'
IGNORE INTO TABLE table
CHARACTER SET UTF8
FIELDS TERMINATED BY ';'
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'

Danny Beckett
- 20,529
- 24
- 107
- 134

JMHeap
- 1,186
- 2
- 8
- 4
-
1
-
Oh my, took me so long. Tried everything, it just kept converting utf8 to latin and importing it into a utf 8 table. The encoding option worked wonders. – John Oct 12 '17 at 22:44
-
Basically it's an error of mysql up until the latest version, including MariaDb. If a table or column is UTF8 it needs to automatically take the correct values. Well it does not, you need to specify it and hope you've no mixed table. – John Oct 17 '18 at 21:17
-
CHARACTER SET UTF8 Works like magic. tried many things but this was the solution – venkatesh .b Jan 10 '22 at 16:11
-
Please tell us where in the [documentation](https://dev.mysql.com/doc/refman/8.0/en/load-data.html) it says what the options are for `character set`? I mean, should we write `UTF8` or `UTF-8`? Quoted or not? Case sensitive? – Rodrigo Feb 27 '23 at 22:39
-
20
as said in http://dev.mysql.com/doc/refman/5.1/en/load-data.html, you can specify the charset used by your CSV file with the "CHARACTER SET" optional parameter of LOAD DATA LOCAL INFILE

dweeves
- 5,525
- 22
- 28
-
It does not say that mysql wrongly uses another charset, regardless what column charset you've set ! – John Oct 17 '18 at 21:20
3
Do not need encode your characters in the file, but you need to make sure that your file is encoding at UTF-8 before load this file to database.

BuiXuanThanh
- 31
- 2
2
You should send
init_command = 'SET NAMES UTF8'
use_unicode = True
charset = 'utf8'
when doing MySQLdb.connect() e.g.
dbconfig = {}
dbconfig['host'] = 'localhost'
dbconfig['user'] = ''
dbconfig['passwd'] = ''
dbconfig['db'] = ''
dbconfig['init_command'] = 'SET NAMES UTF8'
dbconfig['use_unicode'] = True
dbconfig['charset'] = 'utf8'
conn = MySQLdb.connect(**dbconfig)
edit: ah, sorry, I see you've added that you're using "LOAD DATA LOCAL INFILE" -- this wasn't clear from your initial question :)

simon
- 15,344
- 5
- 45
- 67
-
-
@miyalys -- yes, it's python as specified in the question... did you downvote for that?! – simon Feb 27 '19 at 00:12
-
...and yes. I tried to undo it but the site sadly prevents me from changing the vote before the answer is edited. So if you edit it in some fashion at some point let me know and then I'll undo it. – miyalys Mar 06 '19 at 18:58
1
Try something like,
LOAD DATA LOCAL INFILE "file" INTO TABLE message_history CHARACTER SET UTF8 COLUMNS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' ESCAPED BY '"';
Original Structure,

Hasitha Nanayakkara
- 131
- 1
- 3