2

I 'm working on a daily scheduled rake task that will download a CSV that is automatically sent to Dropbox every day, parse it and save to the database. I don't have control over the way the data is entered into the program that generates the CSV reports for this, so I can't avoid there being double quotes used in some of the data. However, I am wondering if there is a way I can strip or replace them with single quotes within the rake task or somehow inform the parser so it doesn't throw this error.

Rake task code:

require 'net/http'
require 'csv'
require 'open-uri'

namespace :fp_import do
    desc "download abc_relations from dropbox, save as csv, create or update record in db"
    task :fp => :environment do
        data = URI.parse("<<file's dropbox link>>").read

       File.open(Rails.root.join('lib/assets', 'fp_relation.csv'), 'w') do |file|
         file.write(data)
       end

       file= Rails.root.join('lib/assets', 'fp_relation.csv')

        CSV.foreach(file) do |row|
            div, fg_style, fg_color, factory, part_style, part_color, comp_code, vendor, design_no, comp_type = row
            fg_sku = fg_style + "-" + fg_color
            part_sku = part_style + "-" + part_color

            relation = FgPart.where('part_sku LIKE ? AND fg_sku LIKE?', "%#{part_sku}%", "%#{fg_sku}%").exists?
            if relation == false

                FgPart.create(fg_style: fg_style, fg_color: fg_color, fg_sku: fg_sku, factory: factory, part_style: part_style, part_color: part_color, part_sku: part_sku, comp_code: comp_code, comp_type: comp_type, design_no: design_no)
            end
        end
    end
end

There are about 35,000 rows in this CSV. Below is a sample. You can see the double quotes in the 4th row of the sample.

Sample data:

"01","502210","018","ZH","5931","001","M","","UPHOLSTERED GLIDER A","RM"
"01","502310","053","ZH","25332","NO","O","","UPHOLSTERED GLIDER","BAG"
"01","502310","065","ZH","25332","NO","O","","UPHOLSTERED GLIDER","BAG"
"01","502312","424","ZH","25332","NO","O","","UPHOLSTERED GLIDER"AUS"","BAG"
"01","503210","277","ZH","25332","NO","O","","UPHOLSTERED GLIDER","BAG"
"01","503310","076","ZH","25332","NO","O","","UPHOLSTERED GLIDER","BAG"
"01","506210","018","ZH","25332","NO","O","","UPHOLSTERED GLIDER","BAG"
"01","506210","467","ZH","25332","NO","O","","UPHOLSTERED GLIDER","BAG"
"01","507610","932","AZ","25332","NO","O","","GLIDER","BAG"
"01","507610","932","AZ","5936","001","M","","GLIDER","RM"
Tatiana Frank
  • 418
  • 5
  • 22
  • @Flip I'm not sure if your correction is correct. @ Tatiane: Are the '**' part of your csv-data or are they used to mark the critical code? – knut Mar 01 '16 at 22:39
  • If all data are similar to this extract you may remove all " before you use csv. – knut Mar 01 '16 at 22:41
  • @knut: I see, you might be right..will undo that part. Thanks for pointing it out. – Flip Mar 01 '16 at 22:42
  • @knut you're right, I used the '**' to highlight the critical code, it's not actually part of the data. – Tatiana Frank Mar 02 '16 at 14:28
  • You can't guarantee anything about malformed CSV lines, so you'd be better off rejecting bad lines and cleaning them up later. – Todd A. Jacobs Mar 03 '16 at 01:14

2 Answers2

3

The source CSV is malformed, quotes should be escaped before.

I would edit the file before parsing it with CSV and remove quotes between commas, and replace double quotes with simple ones, you can create a new file in case you don't want to edit the original.

def fix_csv(file)
  out = File.open("fixed_"+file, 'w')
  File.readlines(file).each do |line|
    line = line[1...-2] #remove beggining and end quotes
    line.gsub!(/","/,",") #remove all quotes between commas
    line.gsub!(/"/,"'") #replace double quotes to single
    out << line +"\n" #add the line plus endline to output
  end

  out.close
  return "fixed_"+file
end

In case you want to modify the same CSV file, you can do it this way:

require 'tempfile'
require 'fileutils'

def modify_csv(file)
  temp_file = Tempfile.new('temp')
  begin
    File.readlines(file).each do |line|
      line = line[1...-2]
      line.gsub!(/","/,",")
      line.gsub!(/"/,"'")
      temp_file << line +"\n"
    end
    temp_file.close
    FileUtils.mv(temp_file.path, file)
  ensure
    temp_file.close
    temp_file.unlink
  end
end

This is explained here in case you want to take a look, this will fix or sanitize your original CSV file

Community
  • 1
  • 1
Agush
  • 5,036
  • 1
  • 19
  • 32
  • Why would you want to fix it and then parse it again? After you fixed it, it is already parsed and ready to import. – Pascal Mar 02 '16 at 05:44
  • @pascalbetz just in case you don't want to modify the original csv – Agush Mar 02 '16 at 07:30
  • I see. Unless you need the cleaned up file for another process you can leave it as is and import it to AR once it is cleaned. So no need to read, clean, write, read, import. – Pascal Mar 02 '16 at 07:33
  • @pascalbetz Yeah thanks, I know that. The thing is this way the code we already have in the answer doesn't have to be modified, the cleanup logic is separated from the processing logic. – Agush Mar 02 '16 at 07:36
  • Hi @agush, I would like to keep only 1 csv with the correct format instead of creating 2. I'm new to the gsub! method. If I just use line.gsub!(/"/,"'"), how do I save the change to the existing file? – Tatiana Frank Mar 02 '16 at 16:53
  • Hey @TatianaFrank I edited my answer and added the code to modify the same file. – Agush Mar 02 '16 at 18:48
  • @agush thank you this worked! The only odd thing is that every row has single quote ' at the end of it. Was that meant to happen? `01,0020,001,TC,22395,NO,P,,BED RAILS VER A,PEN'` `01,0020,001,TC,23487,260,W,,BED RAILS VER A,RM'` `01,0020,001,TC,23488,988,O,,BED RAILS VER A,BAG'` – Tatiana Frank Mar 02 '16 at 20:50
  • @TatianaFrank yes I just realized you have to cut one extra character if you take into the account the new line, so it should be `line[1...-2]` I have edited the answer so it's correct now – Agush Mar 03 '16 at 00:53
  • Thanks @Agush!! I needed to make it [1...-3] for some reason. – Tatiana Frank Mar 04 '16 at 16:00
  • @TatianaFrank glad it worked, you should mark the question as answered then – Agush Mar 08 '16 at 21:32
2

The CSV is invalid, the quotes should be escaped. If there is no other special treatment required you could read the file line by line, split by , and remove leading/trailing ":

File.foreach(path) do |line|
  columns = line.split(',').map do |column|
    column[1...-1]
  end
  do_something_with_data(columns)
end

UPDATED VERSION

file = Kernel.open(File.join(__dir__, 'input.almost_csv'))
file.each do |line|
  values = line.split(',')
  values = values.map do |value|
    value[1...-1] # Remove leading and trailing double-quote
  end

  div, fg_style, fg_color, factory, part_style, part_color, comp_code, vendor, design_no, comp_type = values
  fg_sku = fg_style + "-" + fg_color
  part_sku = part_style + "-" + part_color

  if !FgPart.where('part_sku LIKE ? AND fg_sku LIKE?', "%#{part_sku}%", "%#{fg_sku}%").exists?
    FgPart.create(fg_style: fg_style, fg_color: fg_color, fg_sku: fg_sku, factory: factory, part_style: part_style, part_color: part_color, part_sku: part_sku, comp_code: comp_code, comp_type: comp_type, design_no: design_no)
  end

end

Note that:

  • You don't need the @ local scope variables are enough.
  • If you want to remove the quotes inside the string as well, you can manipulate the values inside the map block
  • This only works if you don't have the column separator , in the values
Pascal
  • 8,464
  • 1
  • 20
  • 31
  • thanks pascal! Im not sure where this is meant to go on my code, sorry. Could you please add a bit more explanation as to where to incorporate this in the code I posted? – Tatiana Frank Mar 02 '16 at 16:58