1

I'm working with a CSV file received from outside my organization, so I have no control over how the file gets created. The file has ~108K records in it and I'm LOADing into a table specifically created to receive this file. There are two problem records in the file (wrong number of commas/fields) which I discovered by executing the LOAD and seeing the errors. "Awesome," I thought, "I'll just fix those two records manually and I'll get a clean LOAD." After re-creating the table (to LOAD into a virgin table) I executed the LOAD of the "fixed" csv. This time 0 (zero) records would LOAD. All I did was find the offending records, get the fields lined up the way they needed to be (to match the other 107,998 records/lines) and saved it. They are the ~58,000th and ~63,000th records in the file. I made no other changes.

As an experiment in my diagnostic process, I tried simply opening the a copy of the original file in my editor (Text Editor which comes standard in the Ubuntu 18.04 LTS distro) and saving it, with no changes. That file would not load either. In other words, the simple act of opening, then saving the file, "ruins it" as far as the MySQL LOAD command is concerned. When I reopen the "altered" file, it opens fine and appears to be a completely normal csv file.

In all my decades of coding and data management, I have never encountered this issue. I don't even know where to start. Clearly, the "save" is altering the file in some way that makes it unusable, but what could it possibly be?

This is my LOAD command: LOAD DATA LOCAL INFILE '/home/[user]/myfile.csv' INTO TABLE temp005 FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' IGNORE 1 LINES;

JohnCroc
  • 79
  • 1
  • 9

1 Answers1

0

This sounds to me like your file arrived with Windows style line separators \r\n. Then when you wrote it out with your editor it got UNIX-style line endings: just \n. Or maybe vice versa. You should be able to tell by looking at before-and-after byte counts for the file.

Or you may be able to use an editor that has a way to show no printable characters. Notepad++ does that.

At any rate, figure out how to tell your editor to leave the line endings unchanged and you should be good to go.

Or change LINES TERMINATED BY '\r\n' to LINES TERMINATED BY '\n' in your LOAD DATA INFILE command.

Edit linux has tools to change line endings. Here's info. How to convert Windows end of line in Unix end of line (CR/LF to LF)

Bad decisions made 45 years ago about line ending compatibility are the cause of this hassle. Windows ≠ Mac ≠ UNIX. WTF? WTF?

O. Jones
  • 103,626
  • 17
  • 118
  • 172
  • I thought of that, and it IS likely this file was created with software in a Windows env. But my LOAD command (I've added it to the OP) has LINES TERMINATED BY '\n'. So if anything, it should have errored prior to the fix/save and then been fine after the save. That said, you've motivated me to reload Notepad ++ on my Linux machine and actually look at the non-printable characters. So thank you! – JohnCroc Sep 12 '20 at 22:31
  • So...the original file has [CR] at the end of each line with data and [CR][LF] at the end of each blank line. I forgot to mention the file was double-spaced. So that seems weird. But the in the saved file, all the [LF]s have been stripped out leaving only [CR} at the end of each line. Maybe my LINES TERMINATED BY needs to be '\r\r'? I'll do some more experimenting. – JohnCroc Sep 12 '20 at 22:49
  • Please see my edit. This stuff is a real pain in the xxx neck. – O. Jones Sep 12 '20 at 23:09