I am processing a big CSV file with a lot of columns and rows (tens of thousands, so it's nearly impossible to check cell by cell).
Somewhere in the file has probably occurred a bad character. I've tried to use construction begin - rescue
to skip the currently processed row if there's an error (and mainly the error in the headline), but it doesn't work, the script will stop when it stumble upon the character.
Is there any way to ignore/skip this "bad" character/symbol?
For processing the CSV file, I am using SmartCSV
.
EDIT: Some code
datas = SmarterCSV.process(file, {:col_sep => ';', :chunk_size => 100, :remove_empty_values => false, :remove_empty_hashes => false }) do |data|
begin
data.each do |d|
user.something = d[:hobby]
...
here is basically just saving data from the file to database tables
...
end
rescue => e
logger.warn "Ooops, an error occurred while processing this record: #{e}"
end
end
I've also tried to put the begin
construction into the data.each
, but it didn't help to avoid the situation too.
As a solution for this issue is to use encoding every element/cell of the file, but each row has like 70 cells... So am trying to look for a better solution, if there is any.
EDIT2: Adding # encoding: UTF-8
on top of the file processing the CSV. The CSV file has us-ascii
charset.