2

I am trying to process some very large tab-separated files. The process is:

  begin
   Dir["#{@data_path}*.tsv"].each do |file|
       begin              
          CSV.foreach(file, :col_sep => "\t") do |row|

           # assign columns to model and save

           end
           @log.info("Loaded all files into MySQL database illu.datafeeds")
       rescue Exception => e
             @log.warn("Unable to process the data feed: #{file} because #{e.message}")
             next
       end
   end

However, when I execute this I get the following error:

Unable to process the file: /Users/XXXXX_2013-06-12.tsv because Illegal quoting in line 153.

The files are too big for me to go in and fix the error rows. I would like the process to continue the loop and process the file even if there are error rows.

Any suggestions?

Thanks.

Arup Rakshit
  • 116,827
  • 30
  • 260
  • 317
analyticsPierce
  • 2,979
  • 9
  • 57
  • 81

1 Answers1

4

just ... rescue nil the row causing the error

you can even log it with logger

before the loop:

error_log ||= Logger.new("#{Rails.root}/log/my.log")

inside the loop instead of just rescue nil use

rescue error_log.info(row.to_s)

in case you get the error before file begins to parse (before .foreach procedure) you can open it as raw file and read it as CSV later - inside the loop (like mentioned here)

..or just rescue full file parsing procedure

 CSV.foreach(file, :col_sep => "\t") do |row|
    ...
 end rescue error_log.info(row.to_s) 
Community
  • 1
  • 1
okliv
  • 3,909
  • 30
  • 47
  • I'm not finding success with the method you suggested. If I change to rescue nil, the block still fails I just get no error message. If I use error_log.info(row.to_s) it fails because (row.to_s) does not exist outside the CSV.foreach. – analyticsPierce Jul 03 '13 at 07:39
  • I tried this. begin Dir["#{@data_path}*.tsv"].each do |file| begin CSV.foreach(file, :col_sep => "\t") do |row| # do stuff end @log.info("Loaded all files") rescue @log.info(row.to_s) end end end – analyticsPierce Jul 03 '13 at 07:42
  • I actually got this to work by moving the row error capture to a begin rescue block inside the CSV.foreach loop. Thanks. – analyticsPierce Jul 04 '13 at 07:42