15

I have found a CSV parsing issue with FasterCSV (1.5.0) which seems like a genuine bug, but which I'm hoping there's a workaround for.

Basically, adding a space after the separator (in my case a comma) when the fields are enclosed in quotes generates a MalformedCSVError.

Here's a simple example:

# No quotes on fields -- works fine
FasterCSV.parse_line("one,two,three")
=> ["one", "two", "three"]

# Quotes around fields with no spaces after separators -- works fine
FasterCSV.parse_line("\"one\",\"two\",\"three\"")
=> ["one", "two", "three"]

# Quotes around fields but with a space after the first separator -- fails!
FasterCSV.parse_line("\"one\", \"two\",\"three\"")
=> FasterCSV::MalformedCSVError: Illegal quoting on line 1.

Am I going mad, or is this a bug in FasterCSV?

Olly
  • 7,732
  • 10
  • 54
  • 63

3 Answers3

14

The MalformedCSVError is correct here.

Leading/trailing spaces in CSV format are not ignored, they are considered part of a field. So this means you have started a field with a space, and then included unescaped double quotes in that field, which would cause the illegal quoting error.

Maybe this library is just more strict than others you have used.

Ben James
  • 121,135
  • 26
  • 193
  • 155
  • Isn't the space saying that the field is actually not surrounded by quotes (since the first char is not a quote) and that quotes should be taken as part of the field content? – Vincent Robert Nov 27 '09 at 10:42
  • 1
    Looks like I'm wrong. "If fields are not enclosed with double quotes, then double quotes may not appear inside the fields." -- http://tools.ietf.org/html/rfc4180#section-2 – Vincent Robert Nov 27 '09 at 10:45
  • You're right, I didn't realise there was a 'spec' for CSV but it seems that there is. FasterCSV is indeed just very strict. – Olly Nov 30 '09 at 11:17
2

Maybe you could set the :col_sep: option to ', ' to make it parse files like that.

Robert Massa
  • 4,345
  • 1
  • 32
  • 44
2

I had hoped that the :col_sep option might allow a regular expression, but it seems to be used for both reading and writing, which is a shame. The documentation doesn't hold out much hope and your need is probably more immediate than could be satisfied by requesting a change or submitting a patch ;-)

If you're calling #parse_line explicitly, then you could always call

gsub(/,\s*/, ',')

on your input line. That regular expression might need to change significantly if you anticipate the possibility of comma-space within quoted strings. (I'd suggest reposting such a question here with a suitable tag and let the RegEx mavens loose on it should that be the case).

Mike Woodhouse
  • 51,832
  • 12
  • 88
  • 127