I have two CSV files which are basically the same, but for some reason, SmarterCSV can not read the one named bad_file
Here is a Gist of both files. Ruby's native CSV library can read bad_file
no problem.
Before processing each file. I strip everything above the header row using the below code:
def self.clean(file)
if (csv = File.read(file).gsub!(/\A.+?(?=^Date,)/m, ''))
tempfile = Tempfile.new('file_name')
tempfile.write(csv)
tempfile
else
file
end
end
I then pass that file into smarter CSV like this:
File.open(file, encoding: 'bom|utf-8') do |f|
chunk = SmarterCSV.process(f, {
verbose: true,
remove_empty_hashes: true,
col_sep: :auto,
force_utf8: true,
force_simple_split: true,
strip_chars_from_headers: /[\-"\xEF\xBB\xBF]/,
duplicate_header_suffix: ''
})
end
I can not figure out what is even differnt about the CSV files, let alone why SmarterCSV can't process the bad one. Also, if anyone has a better method for stripping the unneeded info from the top of the spreadsheet, that could solve this problem right there.