0

I am processing a big CSV file with a lot of columns and rows (tens of thousands, so it's nearly impossible to check cell by cell).

Somewhere in the file has probably occurred a bad character. I've tried to use construction begin - rescue to skip the currently processed row if there's an error (and mainly the error in the headline), but it doesn't work, the script will stop when it stumble upon the character.

Is there any way to ignore/skip this "bad" character/symbol? For processing the CSV file, I am using SmartCSV.

EDIT: Some code

datas = SmarterCSV.process(file, {:col_sep => ';', :chunk_size => 100, :remove_empty_values => false, :remove_empty_hashes => false }) do |data|
  begin
    data.each do |d|
      user.something = d[:hobby]
      ...
      here is basically just saving data from the file to database tables
      ...
    end
  rescue => e
    logger.warn "Ooops, an error occurred while processing this record: #{e}" 
  end
end  

I've also tried to put the begin construction into the data.each, but it didn't help to avoid the situation too.

As a solution for this issue is to use encoding every element/cell of the file, but each row has like 70 cells... So am trying to look for a better solution, if there is any.

EDIT2: Adding # encoding: UTF-8 on top of the file processing the CSV. The CSV file has us-ascii charset.

user984621
  • 46,344
  • 73
  • 224
  • 412

1 Answers1

0

I was having a similar issue and providing encoding while opening the file solved the problem for me:

file = File.open(params[:file].tempfile, "r:bom|utf-8")
SmarterCSV.process(file, {chunk_size: 10000, col_sep: ";"}) do |chunk|
  # ...
end