44

I am using ruby CSV.read with massive data. From time to time the library encounters poorly formatted lines, for instance:

"Illegal quoting in line 53657."

It would be easier to ignore the line and skip it, then to go through each csv and fix the formatting. How can I do this?

JZ.
  • 21,147
  • 32
  • 115
  • 192

5 Answers5

86

I had this problem in a line like 123,456,a"b"c

The problem is the CSV parser is expecting ", if they appear, to entirely surround the comma-delimited text.

Solution use a quote character besides " that I was sure would not appear in my data:

CSV.read(filename, :quote_char => "|")

Ray Baxter
  • 3,181
  • 23
  • 27
  • 2
    In the requestor's situation he specifically has massive data and just wants to skip errors. Changing the :quote_char just helped me out in my situation though. – Jesse Clark Mar 05 '13 at 22:06
  • this worked in my situation as well, it ran through 100+ thousand lines of csv data without errors – Paul Carlton Jan 02 '15 at 06:56
  • Awesomeness continues in 2015 as well :) Thanks.. I did take like a couple of hours to reach here :) – Suraj Jan 24 '15 at 11:03
  • 20
    For my data, I was unsure which characters would not appear, but it apparently works even with unprintable characters like `quote_char: "\x00"`. – Max Mar 27 '15 at 14:32
  • 1
    If, like me, you are here because of a Google search to try and fix this error. My issue was I had a `""` in my csv file (probably from my text editor adding the addition `"`. As per Ruby docs, `CSV will always consider a double sequence of this character to be an escaped quote. This String will be transcoded into the data’s Encoding before parsing.` https://ruby-doc.org/stdlib-2.3.0/libdoc/csv/rdoc/CSV.html – Jay Killeen Jan 06 '17 at 01:50
  • 1
    At least in recent versions, setting the quote_char to nil should also work and is semantically more correct. – Felix Mar 13 '23 at 17:45
44

The liberal_parsing option is available starting in Ruby 2.4 for cases like this. From the documentation:

When set to a true value, CSV will attempt to parse input not conformant with RFC 4180, such as double quotes in unquoted fields.

To enable it, pass it as an option to the CSV read/parse/new methods:

CSV.read(filename, liberal_parsing: true)
Mr. Tim
  • 886
  • 8
  • 18
Will Madden
  • 6,477
  • 5
  • 28
  • 20
6

Try forcing double quote character " as quote char:

require 'csv'
CSV.foreach(file,{headers: :first_row, quote_char: "\x00"}) do |line|
  p line
end
Tombart
  • 30,520
  • 16
  • 123
  • 136
6

Don't let CSV both read and parse the file.

Just read the file yourself and hand each line to CSV.parse_line, and then rescue any exceptions it throws.

DigitalRoss
  • 143,651
  • 25
  • 248
  • 329
1

Apparently this error can also be caused by unprintable BOM characters. This thread suggests using a file mode to force a conversion, which is what finally worked for me.

require 'csv'

CSV.open(@filename, 'r:bom|utf-8') do |csv|
  # do something
end
allknowingfrog
  • 223
  • 3
  • 8