3

I have some a Ruby script that reads in some CSV files, processes them, and writes out a (new) set of CSV files. I'm using Ruby 1.9.2 and the new standard 'csv' gem (that used to be FasterCSV). The source CSV files contain non-ascii characters (é etc) but they are coming out of Excel so the encoding is not properly notated. Specifically, when I load the file into ruby I get this:

require 'csv'
t = CSV.table('file.csv',:converters=>nil)
t.to_s.encoding
# encoding is ASCII-8BIT

Even though the actual string is UTF-8. My issue is that I can't seem to get this string, which is marked as being ASCII to actually convert to UTF-8. When I try this:

require 'csv'
t = CSV.table('file.csv',:converters=>nil)
f = File.new('output.csv','w:utf-8')
f.write(t.to_s.force_encoding('utf-8'))
f.close

The output file is still listed as being encoded in ASCII. What do I need to do to get the output file to be encoded in UTF-8?

John Sullivan
  • 1,301
  • 1
  • 10
  • 20
  • Please look at this discussion: http://stackoverflow.com/questions/7047944/ruby-read-csv-file-as-utf-8-and-or-convert-ascii-8bit-encoding-to-utf-8 Hope this help you. – WarHog Oct 19 '11 at 19:42

1 Answers1

2

If you've used Mac Excel to output the files they'll actually be MacRoman encoding, the code below may not be the best way to do it but it works

rows = []
CSV.foreach("../yourfile.csv", col_sep: ",", encoding: "MacRoman") do |row|
  rows << row.map! {|v| v.encode("UTF-8") unless v == nil } 
end

then you can convert to CSV::Table or whatever

David Burrows
  • 5,217
  • 3
  • 30
  • 34