0

I have the following CSV file:

textbox6,textbox10,textbox35,textbox17,textbox43,textbox20,textbox39,textbox23,textbox9,textbox16
"Monday, March 02, 2015",Water Front Lodge,"Tuesday, September 23, 2014",,Routine,#1 Johnson Street,Low,Northern Health - Mamaw/Keewa/Athab,Critical Item,4 - Hand Washing Facilities/Practices
"Monday, March 02, 2015",Water Front Lodge,"Thursday, August 01, 2013",,Routine,#1 Johnson Street,Low,Northern Health - Mamaw/Keewa/Athab,General Item,11 - Accurate Thermometer Available to Monitor Food Temperatures
"Monday, March 02, 2015",Water Front Lodge,"Wednesday, February 08, 2012",,Routine,#1 Johnson Street,Low,Northern Health - Mamaw/Keewa/Athab,Critical Item,1 - Refrigeration/Cooling/Thawing (must be 4°C/40°F or lower)
"Monday, March 02, 2015",Water Front Lodge,"Wednesday, February 08, 2012",,Routine,#1 Johnson Street,Low,Northern Health - Mamaw/Keewa/Athab,General Item,12 - Construction/Storage/Cleaning of Equipment/Utensils

And here's what file tells me:

Little-endian UTF-16 Unicode text, with CRLF, CR line terminators

I was trying to use Scala-csv to parse it but always get Malformed CSV exceptions. I've uploaded it to CSV Lint and get 5 "unknown errors".

Eyeballing the file, I cannot determine why two separate parsers would fail. it seems to be perfectly ordinary and valid CSV. What about it is malformed?

And yes, I'm aware that it's terrible CSV. I didn't create it -- I just have to parse it.

EDIT: Of note is that this parser also fails.

Community
  • 1
  • 1
Kat
  • 4,645
  • 4
  • 29
  • 81
  • well some fields use quotes, others don't for a start... – Mitch Wheat Mar 10 '15 at 03:48
  • 1
    The complaint is probably due to the file having non-Posix line ends. Posix requires a `newline` line termination. You can use `sed` or the like to add a `newline` after each `carriage return`. (that in fact is what the error says - i.e. a `CRLF` file with only `CR` line terminators). You are missing `newlines` - (char `0xa`) – David C. Rankin Mar 10 '15 at 03:49
  • Your file opens with no issue in Excel. I also don't think that quotes are a problem. **However**, the fields in quotes themselves contain commas. I think the Scala-CSV parser is rolling over because it sees **12** commas on line 2 instead of **9** on the first line containing the header. – Tim Biegeleisen Mar 10 '15 at 03:55
  • @TimBiegeleisen, isn't that the typical way to enclose commas inside fields? By quoting the field. – Kat Mar 10 '15 at 03:59
  • Well Scala-CSV would have to know to ignore those commas. You can try taking the commas out and see if it parses. If not, then I would go after David Rankin's comment with the format of your line returns. – Tim Biegeleisen Mar 10 '15 at 04:01
  • I gave up trying to find a Scala parser after trying the first 3 google hits. I then tried [opencsv](http://opencsv.sourceforge.net/) and it technically works, but it's got multiple blank rows that don't exist in the code and an "11" somehow is length 3. Maybe a weird encoding issue. – Kat Mar 10 '15 at 05:40

1 Answers1

2

It is definitely the newline. See the Lint results here:
CSV Lint Validation

I copied your SCV and made sure the newline characters were CRLF
I used Notepad++ and used the Edit=>EOL Conversion=>Windows Format to do the conversion.

Misunderstood
  • 5,534
  • 1
  • 18
  • 25
  • I don't think this is it. According to Notepad++, it's already CRLF. I converted to Unix and back to Windows to be sure. [Still failed](http://csvlint.io/validation/54fe81f66373760aaf070000). But this still says CR is being used, so something must be off... – Kat Mar 10 '15 at 05:33
  • As an aside, GitHub is able to display the files fine. Here's a repo with the file in question: https://github.com/MikeHoffert/Testing-CSV. I don't understand how GitHub can display this file fine but multiple dedicated parsers choke. – Kat Mar 10 '15 at 05:37
  • On the Lint Validation Link there is a link to my CSV file. You can download and try it. – Misunderstood Mar 10 '15 at 16:41