1

I am importing a .csv into Mysql like so:

LOAD DATA INFILE '(...)/contact_acquisition.csv'
INTO TABLE rs_ui_db.contact_acquisition
CHARACTER SET utf8
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;  

...which throws the following error:

"Data too long for column 'comments' at row 63"

I look at row 63 of the .csv, and "comments" has a length of 115.

The create statement seems OK (with comments char(255)) :

CREATE DATABASE IF NOT EXISTS `my_db` /*!40100 DEFAULT CHARACTER SET utf8 */;
USE `my_db`;

DROP TABLE IF EXISTS `contact_acquisition`;
CREATE TABLE `contact_acquisition` (
  `SID` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `pilot` char(255) NOT NULL,
  `contact` char(255) NOT NULL,
  `email` char(255) NOT NULL,
  `tel` char(255) NOT NULL,
  `comments` char(255) NOT NULL,
  PRIMARY KEY (`SID`)
) ENGINE=InnoDB AUTO_INCREMENT=76 DEFAULT CHARSET=utf8; 

Before marking it as duplicate, please note:

This StackOverflow solution obviously doesn't fit: I am already adding CHARACTER SET utf8 to the LOAD DATA sql.

This other StackOverflow solution seems a bit far-fetched to me, since all the data I am inserting (the create database statement, the create table statement, the CSV, etc...) has been extracted from another server in which the exact same row is allowed. I have used the Mysql Workbench in order to extract creates/csv etc.

I am trying to export automatically from one server to another, so changing the data type in the destination is not an option for me.


Update:

This is the line (obfuscated, but keeping the non-alphanumerical characters):

64,xxxx xxxx,"xxxx xxxx, xxxx xxxx","xxxx.xxxx@xxxx.xxxx, xxxx.xxxx@xxxx-xxxx.xxxx.xxxx",xxxx (xxxx) xxxx xxxx xxxx xxxx xxxx (xxxx),"xxxx xxxx xxxx'xxxx xxxx xxxx xxxx xxxx xxxx xxxx, xxxx xxxx xxxx'xxxx xxxx xxxx'xxxx xxxx xxxx xxxx xxxx"

The part MySQL complains about is: "xxxx xxxx xxxx'xxxx xxxx xxxx xxxx xxxx xxxx xxxx, xxxx xxxx xxxx'xxxx xxxx xxxx'xxxx xxxx xxxx xxxx xxxx". Which is wrapped by ", contains 1 comma and 3 '. Other lines (from the same CSV) that are correctly imported also have these characters. The length seems to be the only difference with other lines.

csv file is in utf8.

Community
  • 1
  • 1
Xavier Peña
  • 7,399
  • 9
  • 57
  • 99
  • Have you checked if a stray comma or line break characters within your data is not messing up with the way MySQL can import the data? – Shadow Oct 05 '16 at 09:56
  • @Shadow This is the line (obfuscated, but keeping the non-alphanumerical characters) `64,xxxx xxxx,"xxxx xxxx, xxxx xxxx","xxxx.xxxx@xxxx.xxxx, xxxx.xxxx@xxxx-xxxx.xxxx.xxxx",xxxx (xxxx) xxxx xxxx xxxx xxxx xxxx (xxxx),"xxxx xxxx xxxx'xxxx xxxx xxxx xxxx xxxx xxxx xxxx, xxxx xxxx xxxx'xxxx xxxx xxxx'xxxx xxxx xxxx xxxx xxxx"\r\n`. The comma or line break chars seem to be correctly set. I have added some more infos to the post, under the section Update. – Xavier Peña Oct 05 '16 at 10:04
  • Set `comments` to a higher character length, import, and then see what length MySQL reports back for this column in this record ...? – CBroe Oct 05 '16 at 10:12
  • What about characters, like form feed, that MySQL may interpret as end of a line? Form feed is not a visible character. Also, the error message mentioned line 63, yet the data seems to come from line 64 (SID may be different, but who knows). – Shadow Oct 05 '16 at 10:13
  • @Shadow The id vs line number discrepancy is because of the header (`IGNORE 1 ROWS`). – Xavier Peña Oct 05 '16 at 10:14
  • @CBroe I've tried to set comments to `char(511)` in the `CREATE` statement, but it seems that it is not allowed: "Column length too big for column 'comments' (max = 255); use BLOB or TEXT instead". – Xavier Peña Oct 05 '16 at 10:18
  • So use a BLOB or TEXT for testing. – CBroe Oct 05 '16 at 10:21
  • `CHAR` cannot exceed 255. you would have to use `VARCHAR` for that (which I would do for comments anyway, but maybe speed ups are a sensible reason here to not do that). – Florian Heer Oct 05 '16 at 10:22
  • If the SID is continuous, then it should be 62, not 64 with the ignore. You should consider checking multiple lines. But the best would be to try to get MySQL to import this data into any table just to check how MySQL sees the imported data. – Shadow Oct 05 '16 at 10:22
  • To be honest, I do not think that you have a programming issue, I think you have a data issue here. – Shadow Oct 05 '16 at 10:24
  • @CBroe With `TEXT` everything works just fine (but as stated in the post, I am trying to export automatically from one server to another, so changing the data type in the destination is not an option for me). – Xavier Peña Oct 05 '16 at 10:51
  • @Shadow Some ids were deleted in the source, this is why it is not completely sequential. Now this is weird: I've changed the problematic `comment` to `"-"` and the error persists. I've changed it to `-` (without quotes, since it does not contain any comma) and it works OK. This has me scratching my head... maybe there is something in the preceding lines, as you suggested. – Xavier Peña Oct 05 '16 at 10:55
  • If it works with text, then check out the record in question and see what was imported as comment. That should give you a clue. – Shadow Oct 05 '16 at 11:07
  • Just a thought, is it not potentially an option to use a dump via mysqldump? Copying from one server to another should work with that easily. And if that results in an error as well, you would have potentially more hints as to what goes wrong. – Florian Heer Oct 05 '16 at 18:54
  • @FlorianHeer I can't use mysqldump because for some of the tables that are too heavy and linked to other tables, I have to do some special procedures. In the end I gave up on the csv's (which I was using for simplicity) and I directly created a system that retrieves the data through an ORM, formats it as efficient SQL inserts, and sends them to the destination server. It's more work but it's faster and apparently I couldn't build the cvs's properly anyway. – Xavier Peña Oct 06 '16 at 17:05

1 Answers1

2

Try with LINES TERMINATED BY '\r\n'

LOAD DATA INFILE '(...)/contact_acquisition.csv'
INTO TABLE rs_ui_db.contact_acquisition
CHARACTER SET utf8
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 ROWS;

This works perfectly for me

TomNg
  • 1,897
  • 2
  • 18
  • 25