2

Ruby 2.3.0, Rails 4.2.4, and actually using postgreSQL rather than SQLite

Updated for clarity

I have a large csv file (externally updated & downloaded daily) and wrote a method to update a Rails database table. I do not want the method to append all rows to the database without validating uniqueness, so I use this great solution (How do I make a column unique and index it in a Ruby on Rails migration?) with add_index.

I'm using a rake file to store the executable update code and I enter $ rake update_task in my terminal (which works IF the table has no duplicates with the imported csv rows). The problem with this is that the database ABORTS (rake aborted!) the rake when it encounters the first duplicate entry (ERROR: duplicate key value violates unique constraint).

What can I do to remove/not save any duplicates while avoiding aborting/failing? I cannot simply drop the database table and reload it each day. Here is the schema:

ActiveRecord::Schema.define(version: 20160117172450) do

# These are extensions that must be enabled in order to support this database
enable_extension "plpgsql"

  create_table "tablename", force: :cascade do |t|
    t.string   "attr1"
    t.string   "attr2"
    t.string   "attr3"
    t.datetime "created_at", null: false
    t.datetime "updated_at", null: false
  end

  add_index "tablename", ["attr1", "attr2", "attr3"], name: "index_tablename_on_attr1_and_attr2_and_attr3", unique: true, using: :btree

end

and my rake task in lib/tasks/file_name.rake contents:

desc "Download data and update database table"

task :update_task => :environment do
  u = CorrectClassName.new
  u.perform_this
end

and CorrectClassName is in an .rb file in app/directory1:

class CorrectClassName

  def perform_this
    something = ClassWithUpdateCode.new
    something.update_database
  end

end

and ClassWithUpdateCode is in an .rb file in app/directory2:

require 'csv'

class ClassWithUpdateCode

  def update_database
    csv_update = File.read(Rails.root.join('lib', 'assets', "file_name.csv"))
    options = {:headers => true}

    csv = CSV.parse(csv_update, options)
    csv.each do |row|
        tm = TableModel.new

        tm.attr1 = row[0]
        tm.attr2 = row[1]
        tm.attr3 = row[2]
        tm.save # maybe I can use a different method or if statement here?
    end
  end

end

Update: @Kristan's solution works below, but here is where to put the begin/rescue/end handling:

In the .rb file in app/directory2:

require 'csv'

class ClassWithUpdateCode

  def update_database
    csv_update = File.read(Rails.root.join('lib', 'assets', "file_name.csv"))
    options = {:headers => true}

    csv = CSV.parse(csv_update, options)
    csv.each do |row|
        tm = TableModel.new
        begin
          tm.attr1 = row[0]
          tm.attr2 = row[1]
          tm.attr3 = row[2]
          tm.save
        rescue ActiveRecord::RecordNotUnique
        end
    end
  end

end
Community
  • 1
  • 1
JHFirestarter
  • 63
  • 1
  • 9
  • What version of PostgreSQL are you using? 9.5 adds support for. `INSERT ... ON CONFLICT DO NOTHING` – Thomas Walpole Jan 23 '16 at 16:01
  • This may very well work (I am using 9.5), Tom...I just haven't tried raw SQL before! At some point in the future, I will need to speed up by not using the csv parsing and probably have /copy from and perhaps your SQL suggestion here. – JHFirestarter Jan 24 '16 at 16:14

1 Answers1

2

rake is bailing out because an exception gets raised when you try to save a record that violates your table's uniqueness constraint. The simplest way to prevent that is by catching and ignoring the exception. I'm assuming your record is created during u.perform_this.

task :update_task => :environment do
  u = CorrectClassName.new
  begin
    u.perform_this
  rescue ActiveRecord::RecordNotUnique
    # move on
  end
end

Another option is to add a uniqueness validation to your Rails model, then either check valid? before saving or call create (not create!), which doesn't raise validation exceptions.

class CorrectClassName < ActiveRecord::Base
  validates_uniqueness_of :attr1, scope: [:attr2, :attr3]
end
task :update_task => :environment do
  u = CorrectClassName.new(data)
  u.perform_this if u.valid?
end
Kristján
  • 18,165
  • 5
  • 50
  • 62
  • I was probably too wordy and also should have included the underlying method within "perform_this". The solution needs to (a) honor the uniqueness validation without (b) raising exceptions and aborting/failing. When I tried the begin/rescue/end solution, (b) was solved but not (a)--the table now has duplicates. Same with "if u.valid?" (even when "if u.valid?" is used in conjunction with the perform_this .save method...which I just updated in the event there's a solution). – JHFirestarter Jan 23 '16 at 20:52
  • If you have a unique database index on those attributes, it's impossible to end up with duplicates. Are you sure you've configured things properly? – Kristján Jan 24 '16 at 01:19
  • Ah, interesting...I needed to remove attr3 as a uniqueness constraint--thanks for helping me find that! Still, without the offending attr3 constraint, the begin/rescue/end handling does eliminate exceptions being raised...but the table is not updated (so the rake task is being aborted somehow). Is there some equivalent "move to next" syntax I should use between `rescue` and `end`? – JHFirestarter Jan 24 '16 at 15:55
  • Also, there still is an error with the `.valid?` solution. My custom class `CorrectClassName` is not recognized when inheriting ActiveRecord::Base...I get a `PG::UndefinedTable: ERROR: relation "file_name" does not exist` (where "file_name" is the .rb in app/directory1 in my code above, NOT in app/models). – JHFirestarter Jan 24 '16 at 15:56
  • Ah hah! Okay, when trying to be concise in the question, I made it hard for you to answer. The key is that you said "I'm assuming your record is created during `u.perform_this`." My apologies, the record is actually created _inside_ the `update_database` method (in class `ClassWithUpdateCode` above). So, `begin` `rescue` `end` works when I place it inside the do loop in that method. – JHFirestarter Jan 24 '16 at 16:04