2

I am trying to import a .csv file into a table. I have figured out how to get the data inserted by using the following query:

LOAD DATA INFILE 'examplesofdata.csv' INTO TABLE coins FIELDS TERMINATED BY ',' 
ENCLOSED BY '' ESCAPED BY '\\'  IGNORE 1 LINES;

However for several of my fields I have Arabic content which gets entered as a series of ? I assume this is because I haven't collated the database correctly or I don't fully understand the LOAD DATA INFILE query. Any advice would be greatly appreciated.

The SHOW CREATE TABLE coins; output is:

CREATE TABLE `coins` (
  `cat_num` int(11) NOT NULL,
  `reg_num` int(11) NOT NULL,
  `period` varchar(255) NOT NULL,
  `arb_period` varchar(255) character set utf8 collate utf8_unicode_ci NOT NULL,
  `ruler` varchar(255) NOT NULL,
  `arb_ruler` varchar(255) character set utf8 collate utf8_unicode_ci NOT NULL,
  `mint` varchar(255) NOT NULL,
  `arb_mint` varchar(255) character set utf8 collate utf8_unicode_ci NOT NULL,
  `date` varchar(255) NOT NULL,
  `weight` float NOT NULL,
  `diameter` float NOT NULL,
  `khedieval_num` varchar(255) NOT NULL,
  `ref` text NOT NULL,
 PRIMARY KEY  (`cat_num`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
Krustal
  • 518
  • 1
  • 4
  • 12

7 Answers7

4

LOAD DATA LOCAL INFILE 'filename' INTO TABLE tablename CHARACTER SET utf8 COLUMNS TERMINATED BY '\t' LINES TERMINATED BY '\n';

the CHARACTER SET utf8 does the trick.

Mad Dog Tannen
  • 7,129
  • 5
  • 31
  • 55
user1014390
  • 273
  • 1
  • 2
  • 5
2

This is still a bug with MySQL. However, I found out that the database's default charset is the culprit. There are two possible workarounds:

  1. If you change your database's default charset to LATIN1 then it will work. You can keep your tables/columns UTF-8.
  2. Strangely, if you use the "CHARACTER SET latin1" it will work for both UTF-8 and Latin1 tables/columns. With this method, you can keep your db/table/column charset on UTF-8.
Ngan
  • 377
  • 2
  • 11
1

So I ended up getting an answer from an old instructor for my Databases class. He told me that this problem is actually a reported bug with the current version of MySQL and that the only known solution at the time is to manually import the data through PHP or another scripting language.

The bug for this issue is at: http://bugs.mysql.com/bug.php?id=10195

It didn't help me too much since I was only working on a prototype, and managed a workaround in the mean time, but hopefully it can be of more use to you.

Krustal
  • 518
  • 1
  • 4
  • 12
  • That bug applied to Version 4.1, which did not have the `CHARACTER SET` clause on `LOAD DATA`. But the bug report provides a workaround. Hopefully no one reading this thread is still using 4.1; so this should not be considered "The Answer". – Rick James Jun 18 '16 at 08:03
0

I also had that issue, but instead of series of ?, I was getting truncated data.

Like "aeióu" was being truncated in "aei".

Check the solution I came up with here, you need to match the CSV charset with the LOAD DATA INFILE charset.

Cheers

Nuno Costa
  • 1,210
  • 11
  • 12
0

How about setting CHARACTER SET utf8_unicode or to your locale?

YOU
  • 120,166
  • 34
  • 186
  • 219
  • When I try: LOAD DATA INFILE 'coinsArab.csv' CHARACTER SET utf8_unicode INTO TABLE coins FIELDS TERMINATED BY ',' ENCLOSED BY '' ESCAPED BY '\\' IGNORE 1 LINES; I get an error saying there is an error in my sql, near CHARACTER SET... Is it in the wrong place? – Krustal Jan 26 '10 at 03:07
  • imm, could you try with `utf8_unicode_ci`? I mentioned for utf8 encoding basically. correct syntax you could fine here - http://dev.mysql.com/doc/refman/5.1/en/load-data.html – YOU Jan 26 '10 at 03:12
  • I figured out my syntax error, feel pretty stupid, but it still results in ????? ??? instead of whats in the file. When I copy the text from the excel and paste it in manually it works just fine, I don't know if that helps. – Krustal Jan 26 '10 at 06:10
  • do you have phpmyadmin? please take a look some setting there for collations of fields or you could use `SHOW CREATE TABLE coins` as Yada pointed out. – YOU Jan 26 '10 at 06:17
0

Adding CHARACTER SET utf8 to the LOAD DATA statement is the proximate answer. However, two other issues have been brought up...

When trying to use utf8/utf8mb4, if you see Question Marks (regular ones, not black diamonds),

  • The bytes to be stored are not encoded as utf8. Fix this.
  • The column in the database is CHARACTER SET utf8 (or utf8mb4). Fix this.
  • Also, check that the connection during reading is utf8.

When trying to use utf8/utf8mb4, if you see Truncated text,

  • The bytes to be stored are not encoded as utf8. Fix this.
  • Also, check that the connection during reading is utf8.
Rick James
  • 135,179
  • 13
  • 127
  • 222
0

I also found out that your character_set_client needs to be UTF-8 as well:

mysql> show session variables like 'char%';
+--------------------------+----------------------------------------+
| Variable_name            | Value                                  |
+--------------------------+----------------------------------------+
| character_set_client     | latin1                        
...

Read mysql docs on how to go about changing that for the who server or just the session only.

Ngan
  • 377
  • 2
  • 11