0

I am importing a CSV file into a table called import_csv

The table had the columns first_name, last_name, email, phone_number, organization_id

I am importing the CSV with the following code

file = params[:file]
  filePath = file.path
  fileName = File.basename filePath

  pg = ActiveRecord::Base.connection
  rc = pg.raw_connection
  rc.exec("COPY import_csv (first_name, last_name, email, phone_number) FROM STDIN WITH CSV")

  file = File.open(filePath)
  file::gets

  while !file.eof?
    # Add row to copy data
    rc.put_copy_data(file.readline)
  end

I want to know how i can set the organization_id field without having to have it in the .CSV file that i am importing.

ricks
  • 3,154
  • 31
  • 51

2 Answers2

0
  1. Import your csv file to new table as it is.

  2. Now, modificate table schema with ADD COLUMN command:

    rc = pg.raw_connection

    ...

    rc.exec("ALTER TABLE import_csv ADD COLUMN organization_id integer;")

It should be done now.

Medvedu
  • 165
  • 1
  • 6
  • i am inserting into an already existing table that has the column – ricks Aug 25 '17 at 19:32
  • @RickS I see, as workaround solution you can create temporary table, import your csv into it and use psql 'insert' command from temporary to real table, as it described here https://stackoverflow.com/questions/12618232/copy-a-few-of-the-columns-of-a-csv-file-into-a-table – Medvedu Aug 25 '17 at 19:41
  • I tried that but it was taking 20x as longer using that method, i was hoping there was some way i could just modify the COPY command so i could add a value for a row but its been a few hours and i have not found anything – ricks Aug 25 '17 at 19:43
  • 1
    @Ricks if speed is important for you, you can read next article http://aserafin.pl/2016/06/06/fast-csv-imports-with-rails-and-postgresql/ a similar case described there. – Medvedu Aug 25 '17 at 19:51
  • you could set a default value for the column, `alter table X alter column Y set default Z` do the copy, and then reset the default. possibly all in a single transaction so that no others are effected. – Jasen Aug 26 '17 at 04:26
0

i was able to find a work around by modifying the CSV before the COPY command as follows:

require 'csv'

  # Load the original CSV file
  rows = CSV.read(filePath, headers: true).collect do |row|
    hash = row.to_hash
    hash.merge('phone_number' => '0')
    hash.merge('a1' => hash['organization_id'].to_s + '#{organization_id}' )
  end

  # Extract column names from first row of data
  column_names = rows.first.keys
  txt = CSV.generate do |csv|
    csv << column_names
    rows.each do |row|
      # Extract values for row of data
      csv << row.values
    end
  end

This allowed me to add the organization id without having to add it into the CSV.

if anyone has any better suggestions please let me know because while importing 21,000 record just that rewrite brought the time it takes to import from 0.045868 to 0.858213 seconds

ricks
  • 3,154
  • 31
  • 51