2

I am finding the CSV parsing in Ruby 1.9.3 to be remarkably fragile. So much so that I am wondering if I am doing something wrong

If I do the following in irb I get an error:

1.9.3-p125 :011 > require 'csv'
 => true
1.9.3-p125 :012 > a = 'one,two,three, "four, five",six'
 => "one,two,three, \"four, five\",six" 
1.9.3-p125 :013 > arr = CSV.parse(a)
CSV::MalformedCSVError: Illegal quoting in line 1.
    from /Users/disaacs/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/csv.rb:1925:in `block (2 levels) in shift'
    from /Users/disaacs/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/csv.rb:1887:in `each'
    from /Users/disaacs/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/csv.rb:1887:in `block in shift'
    from /Users/disaacs/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/csv.rb:1849:in `loop'
    from /Users/disaacs/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/csv.rb:1849:in `shift'
    from /Users/disaacs/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/csv.rb:1791:in `each'
    from /Users/disaacs/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/csv.rb:1805:in `to_a'
    from /Users/disaacs/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/csv.rb:1805:in `read'
    from /Users/disaacs/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/csv.rb:1379:in `parse'
    from (irb):13
    from /Users/disaacs/.rvm/rubies/ruby-1.9.3-p125/bin/irb:16:in `<main>'

I've found that the problem is the extra space preceding the "four, five" value. If I remove the space, then it works.

1.9.3-p125 :010 > a = 'one,two,three,"four, five",six'
 => "one,two,three,\"four, five\",six" 
1.9.3-p125 :011 > arr = CSV.parse(a)
 => [["one", "two", "three", "four, five", "six"]]

Spaces in front of the other values does not cause a problem. The following parses just fine

one, two, three,"four, five", six

Is there some parse option I am missing that makes using quoted values so fragile?

Dave Isaacs
  • 4,499
  • 6
  • 31
  • 44

1 Answers1

3

This is correct behavior. It's not being fragile.

Your comma after "four" is ending the field, and the next field starts immediately with the space.

You can't validly put a quote in the middle of a field (without escaping it).

joelparkerhenderson
  • 34,808
  • 19
  • 98
  • 119
  • According to RFC http://tools.ietf.org/html/rfc4180#page-2, you could have comma inside the field if you enclose it with double quotes. – Irfan Mulic Jul 08 '13 at 18:58