20

I have a line in my CSV file that has some escaped quotes:

173,"Yukihiro \"The Ruby Guy\" Matsumoto","Japan"

When I try to parse it the the Ruby CSV parser:

require 'csv'
CSV.foreach('my.csv', headers: true, header_converters: :symbol) do |row|
  puts row
end

I get this error:

.../1.9.3-p327/lib/ruby/1.9.1/csv.rb:1914:in `block (2 levels) in shift': Missing or stray quote in line 122 (CSV::MalformedCSVError)

How can I get around this error?

Andrew
  • 227,796
  • 193
  • 515
  • 708

3 Answers3

29

The \" is typical Unix whereas Ruby CSV expects ""

To parse it:

require 'csv'
text = File.read('test.csv').gsub(/\\"/,'""')
CSV.parse(text, headers: true, header_converters: :symbol) do |row|
  puts row
end

Note: if your CSV file is very large, it uses a lot of RAM to read the entire file. Consider reading the file one line at a time.

Note: if your CSV file may have slashes in front of slashes, use Andrew Grimm's suggestion below to help:

gsub(/(?<!\\)\\"/,'""')
joelparkerhenderson
  • 34,808
  • 19
  • 98
  • 119
  • 1
    This is definitely cleaner than the other answer. Also, I can't get the other answer to work in MRI 2.1. Not sure if it changed the way that `CSV.parse` works or what. – Chris Peters Jan 18 '14 at 15:00
  • I had to check there wasn't a backslash before the backslash: `gsub(/(?<!\\)\\"/,'""')`. This is a hack, but it might get you further than otherwise. – Andrew Grimm Feb 06 '14 at 12:10
20

CSV supports "converters", which we can normally use to massage the content of a field before it's passed back to our code. For instance, that can be used to strip extra spaces on all fields in a row.

Unfortunately, the converters fire off after the line is split into fields, and it's during that step that CSV is getting mad about the embedded quotes, so we have to get between the "line read" step, and the "parse the line into fields" step.

This is my sample CSV file:

ID,Name,Country
173,"Yukihiro \"The Ruby Guy\" Matsumoto","Japan"

Preserving your CSV.foreach method, this is my example code for parsing it without CSV getting mad:

require 'csv'
require 'pp'

header = []
File.foreach('test.csv') do |csv_line|

  row = CSV.parse(csv_line.gsub('\"', '""')).first

  if header.empty?
    header = row.map(&:to_sym)
    next
  end

  row = Hash[header.zip(row)]
  pp row
  puts row[:Name]

end

And the resulting hash and name value:

{:ID=>"173", :Name=>"Yukihiro \"The Ruby Guy\" Matsumoto", :Country=>"Japan"}
Yukihiro "The Ruby Guy" Matsumoto

I assumed you were wanting a hash back because you specified the :headers flag:

CSV.foreach('my.csv', headers: true, header_converters: :symbol) do |row|
Community
  • 1
  • 1
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
-11

Open the file in MSExcel and save as MS-DOS Comma Separated(.csv)

Karan Verma
  • 1,721
  • 1
  • 15
  • 24