4

I have a problem with reading from the csv file. File comes from Windows, so I suppose there are some encoding issues. My code looks like this:

CSV.open(path, 'w', headers: :first_row, col_sep: ';', row_sep: "\r\n", encoding: 'utf-8') do |csv|    
    CSV.parse(open(doc.file.url), headers: :first_row, col_sep: ';', quote_char: "\"", row_sep: "\r\n", encoding: 'utf-8').each_with_index do |line, index| 

        csv << line.headers if index == 0

        # do something wiht row

        csv << line 
    end
end

I have to open existing file and complete some columns from it. So I just create new file. The existing file is stored on Dropbox, so I have to use open method.

The problem is that I get an error in this line:

 CSV.parse(open(doc.file.url), headers: :first_row, col_sep: ';', quote_char: "\"", row_sep: "\r\n", encoding: 'utf-8').each_with_index do |line, index| 

The error is:

  Illegal quoting in line 1. CSV::MalformedCSVError

I check and seems like I don't have BOM characters in the file (not sure if check it right). The problem seems to be in quote character. The exception is thrown for every line in the file.

This is the file that causes me problems: https://dl.dropboxusercontent.com/u/3900955/geo_bez_adresu_10_do_testow_small.csv

I tried different approaches from StackOverflow but nothing helps, for example I changed my code into this:

CSV.open(path, 'w', headers: :first_row, col_sep: ';', row_sep: "\r\n", encoding: 'utf-8') do |csv|
    open(doc.file.url) do |f|
        f.each_line do |line|
            CSV.parse(line, 'r:bom|utf-8') do |row|
               csv << row
            end
        end
    end
end 

but it doesn't help. I will be grateful for any help with parsing this file.

======= edit =========

When I safe the same file on Windows with encoding ANSI as UTF-8 (in Notepad++) I can parse the file correctly. From this discussion What is "ANSI as UTF-8" and how can I make fputcsv() generate UTF-8 w/BOM?, it seems like I have BOM in the original file. How I can check in Ruby if my file is with BOM and how I can parse the csv file with BOM ?

Community
  • 1
  • 1
wawka
  • 4,828
  • 3
  • 28
  • 22

2 Answers2

4

CSV.parse() requires a string on its first argument, but you're passing a File object instead. What happens is that parse() gets to parse the expanded value of (file object).to_s instead and it cause the error.

Update

To read file with BOM you can have this:

CSV.new(File.open('file.csv', 'r:bom|utf-8'), col_sep: ';').each do |row|
  ...
end

Reference: https://stackoverflow.com/a/7780559/445221

Community
  • 1
  • 1
konsolebox
  • 72,135
  • 12
  • 99
  • 105
  • 1
    Yes, I saw this thread on SO, but I can't use File.open, because as I said the file is remote, so I need to use open method instead. When I change my code: CSV.new(open(doc.file.url, 'r:bom|utf-8')..... I got an error: "unknown encoding name - bom|utf-8". So how can I set UTF-8 with encoding for URI.open method ? – wawka Aug 11 '14 at 04:56
1

I didn't find any way to read directly from remote file, if it contains BOM. So I use Tempfile file to create temporary file and then I do CSV.open with 'r:bom|utf-8':

doc = Document.find(doc_id)

path = "#{Rails.root.join('tmp')}/#{doc.name.split('.').first}_#{Time.now.to_i}.csv"

file = Tempfile.new(["#{doc.name.split('.').first}_#{Time.now.to_i}", '.csv']) 
file.binmode
file << open(doc.file.url).read
file.close

CSV.open(path, 'w', headers: :first_row, col_sep: ';', row_sep: "\r\n", encoding: 'utf-8') do |csv|
    CSV.open(file.path, 'r:bom|utf-8', headers: :first_row, col_sep: ';', quote_char: "\"", row_sep: "\r\n").each_with_index do |line, index| 

    # do something

    end
end 

Now, it seems to parse the file.

wawka
  • 4,828
  • 3
  • 28
  • 22