2

I'm trying to parse CSV files in Rails, which works great except for anything saved in Excel (testing with Version 16.26) for both Windows and Mac (CSVs saved in Numbers & Google sheets work fine). Any character with an accent produces "Encoding::UndefinedConversionError: "\xEF" from ASCII-8BIT to UTF-8".

Excel claims it saves in UTF-8.

I want accented characters to not throw errors when I upload CSVs saved in Excel.

Things I've tried:

  1. setting the read encoding to bom|utf-8 (as per the BOM link), utf-8, iso-8859-1, utf-16, windows-1252, ascii-8bit (and cycling through each of these in an array incase one fails then dropping it out of the array)

  2. current code uses ISO8859-1:UTF-8 which is supposed to read in ISO8859-1 then encode in UTF-8

  3. Creating a tempfile, converting it to binmode, CSV.parse(temp.path, encoding: "bom|utf-8") per the first answer in this thread.

data = CSV.parse(csv, headers: true, header_converters: :symbol, skip_blanks: true, encoding: 'ISO8859-1:UTF-8')

It also works if I take a csv saved in Excel, then save it in google sheets or Numbers then upload it. Unfortunately, Excel is the most common CSV uploaded by our users.

ju_ro
  • 37
  • 3
  • 1
    This should be the [Byte Order Mark (BOM) problem](https://stackoverflow.com/questions/543225/how-to-avoid-tripping-over-utf-8-bom-when-reading-files). Worst case scenario, just strip off the first two bytes and read it in using `parse`. – tadman Sep 06 '19 at 19:59
  • 1
    @tadman: 3 bytes, assuming a utf-8 bom – rici Sep 07 '19 at 03:24
  • I've tried setting the read encoding to bom | utf-8 and it still doesn't work – ju_ro Sep 09 '19 at 13:17

1 Answers1

0

Solved by using csvreader gem. The built in CSV parser sucks in rails.

ju_ro
  • 37
  • 3