This is going to be really difficult. Some things that make FasterCSV, well, faster, make this particularly hard. Here's my best suggestion: FasterCSV can wrap an IO object. What you could do, then, is to make your own subclass of File
(itself a subclass of IO
) that "holds onto" the result of the last gets
. Then when FasterCSV raises an exception you can ask your special File
object for the last line. Something like this:
class MyFile < File
attr_accessor :last_gets
@last_gets = ''
def gets(*args)
line = super
@last_gets << $/ << line
line
end
end
# then...
file = MyFile.open(filename, 'r')
csv = FasterCSV.new file
row = true
while row
begin
break unless row = csv.shift
# do things with the good row here...
rescue FasterCSV::MalformedCSVError => e
bad_row = file.last_gets
# do something with bad_row here...
next
ensure
file.last_gets = '' # nuke the @last_gets "buffer"
end
end
Kinda neat, right? BUT! there are caveats, of course:
I'm not sure how much of a performance hit you take when you add an extra step to every gets
call. It might be an issue if you need to parse multi-million-line files in a timely fashion.
This fails utterly might or might not fail if your CSV file contains newline characters inside quoted fields. The reason for this is described in the source--basically, if a quoted value contains a newline then shift
has to do additional gets
calls to get the entire line. There could be a clever way around this limitation but it's not coming to me right now. If you're sure your file doesn't have any newline characters within quoted fields then this shouldn't be a worry for you, though.
Your other option would be to read the file using File.gets
and pass each line in turn to FasterCSV#parse_line
but I'm pretty sure in so doing you'd squander any performance advantage gained from using FasterCSV.