1

While trying to print Duplicaci¾n out of a CSV file, I get the following error:

ArgumentError - invalid byte sequence in UTF-8

I'm using Ruby 1.9.3-p362 and opening the file using:

CSV.foreach(fpath, headers: true) do |row|

How can I skip an invalid character without using iconv or str.encode(undef: :replace, invalid: :replace, replace: '')?

I tried answers from the following questions, but nothing worked:

Community
  • 1
  • 1
cabe56
  • 404
  • 2
  • 14

1 Answers1

0

This is from the CSV.open documentation:

You must provide a mode with an embedded Encoding designator unless your data is in Encoding::default_external(). CSV will check the Encoding of the underlying IO object (set by the mode you pass) to determine how to parse the data. You may provide a second Encoding to have the data transcoded as it is read just as you can with a normal call to IO::open(). For example, "rb:UTF-32BE:UTF-8" would read UTF-32BE data from the file but transcode it to UTF-8 before CSV parses it.

That applies to any method in CSV that opens a file.

Also start reading in the documentation at the part beginning with:

CSV and Character Encodings (M17n or Multilingualization)

Ruby is expecting UTF-8 but is seeing characters that don't fit. I'd suspect WIN-1252 or ISO-8859-1 or a variant.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303