0

I know there's lots of similar questions but I haven't found a solution yet. I'm trying to use the CSV parsing library with Ruby 1.9.1 but I keep getting:

/usr/lib/ruby/1.9.1/csv.rb:1925:in `block (2 levels) in shift': Illegal quoting in line 1. (CSV::MalformedCSVError)

My CSV files were created in Windows 7 but it's Ubuntu 12.04 that I'm using to run the Ruby script, which looks like this:

require 'csv'

CSV.foreach('out.csv', :col_sep => ';') do |row|
   puts row
end

Nothing complicated, just a test, so I assumed it must be the Windows control characters causing problems. Vim shows up this:

"Part 1";;;;^M
;;;;^M
;;;;^M
Failure to Lodge Income Tax Return(s);;;;^M
NAME;ADDRESS;OCCUPATION;"NO OF CHARGES";"FINE/PENALTY £"^M
some name;"some,address";Bookkeeper;3;1,250.00^M
some name;"some,address";Haulier;1;600.00^M
some name;"some,address";Scaffolding Hire;1;250.00^M
some name;"some,address";Farmer;2;500.00^M
some name;"some,address";Builder;2;3000.00

I've tried removing those control characters for carraige returns that Windows added (^M), but %s/^V^M//g and %s/^M//g result in no pattern found. If I run %s/\r//g then the ^M characters are removed, but the same error still persists when I run the Ruby script. I've also tried running set ffs=unix,dos but it has no effect. Thanks.

Update:
If I remove the double quotes around the Part 1 on the first line, then the script prints out what it should and then throws a new error: Unquoted fields do not allow \r or \n (line 10). If I then remove the \r characters, the script runs fine.

I understand that I would have to remove the \r characters, but why will it only work if I unquote the first value?

RTF
  • 6,214
  • 12
  • 64
  • 132
  • Just for debugging, do `File.readlines('out.csv')` and see what are the characters present at the end of each line. – Arup Rakshit Apr 11 '14 at 11:55
  • I was just running some more tests there, and if I remove the quotes around the 'Part 1' on the first line, then there's no error, and it prints out the csv values just fine ?? – RTF Apr 11 '14 at 11:58
  • Oh, but I still get the error if the ^M chars are there. But once they're removed with `%s/\r//g` and the quotes are removed around the 'Part 1' on the first line, the error is gone. Why? – RTF Apr 11 '14 at 12:01
  • Can you try `CSV.foreach('out.csv', :col_sep => ';', :row_sep => "\r", :force_quotes => true)` with file what you have? Don't remove `^M` and `'part'`... – Arup Rakshit Apr 11 '14 at 12:02
  • @ArupRakshit it causes the same error `Illegal quoting in line 1` – RTF Apr 11 '14 at 12:07
  • now just remove first line and try. Don't remove any `^M` from any line.. – Arup Rakshit Apr 11 '14 at 12:09
  • @ArupRakshit I get `Unquoted fields do not allow \r or \n (line 2)` but once I've removed those `\r` characters (which I don't have a problem with) then everything works fine. I do, however, need to quote the first value, but it's not liking that. – RTF Apr 11 '14 at 12:14
  • run without `:force_quotes => true)`. Did you ? – Arup Rakshit Apr 11 '14 at 12:15
  • still `illegal quoting` error without `:force_quotes` – RTF Apr 11 '14 at 12:20
  • I found something interesting after printing out character codes using ord(). The very first character code that is printed is not 34 which is the code for ". That's the second value. The first one is 65279. That must be causing the problem, but what is it? – RTF Apr 11 '14 at 12:23
  • just do - `File.readlines('out.csv')` to see those.. – Arup Rakshit Apr 11 '14 at 12:24
  • Found this: http://stackoverflow.com/questions/6784799/what-is-this-char-65279 – RTF Apr 11 '14 at 12:24
  • in what os you are now ? – Arup Rakshit Apr 11 '14 at 12:29
  • 1
    see this - http://stackoverflow.com/questions/19350213/illegal-quoting-in-line-1-using-ruby-csv – Arup Rakshit Apr 11 '14 at 12:31
  • 1
    I was just looking at a similar question/answer, it seems to be a popular issue. Anyway I ran it with the encoding set to 'bom|utf-8' and it runs fine (provided the `\r` have been removed). Thanks for all your help @ArupRakshit – RTF Apr 11 '14 at 12:33
  • 1
    glad to help.. you don't need to remove ^M manually, use `:row_sep => "\r"` – Arup Rakshit Apr 11 '14 at 12:37
  • Oh, actually the `:row_sep` option didn't help, it died with `Unquoted fields do not allow \r or \n (line 2)` after printing out the first line – RTF Apr 11 '14 at 12:42

1 Answers1

2

The problem causing the Illegal quoting error was due to a Byte-Order-Mark (BOM) at the very beginning of the file. It didn't show up in editors, but the Ruby CSV lib was choking on it unless :encoding => 'bom|utf-8' was set.

Once that was fixed, I still needed to remove all the '^M' characters by running %s/\r//g in vim. And everything was working fine after that.

Arup Rakshit
  • 116,827
  • 30
  • 260
  • 317
RTF
  • 6,214
  • 12
  • 64
  • 132