1

I have a task like this:

namespace :company do
  task :update, [:code] => :environment do |t, args|
    company = Company.find_or_create_by(code: args[:code])
    company.update_from_local_data
  end
end

And this is the Company class.

class Company < ActiveRecord::Base
  has_many :items
  def update_from_local_data
    data = YAML.load(File.read(ENV['COMPANY_DATA_FILE']))
    update_items(data)
  end
  def update_items(item_array)
    item_array.each do |value|
      item = items.find_or_initialize_by(name: value[:name])
      item.update_attributes(value)
    end
  end
end

I confirmed that there is a lot of SELECT SQL query for this code.

In controller I can handle with it, but how can I use eager loading from rake task?

Edit

Thanks for Uri's comments I see how to improve performance for save several data to Database, but I still have problem how to call find_or_initialize_by for several items.

I found :on_duplicate_key_update option for ActiveRecord.import, but it can be used only with MySQL while I'm using PostgreSQL.

Edit 2

To explain what is the problem I created a example project.

This is a result of Company#update_from_local_data. I don't want SELECT query for every Items.

How can I write it more efficiently?

c = Company.first
c.update_from_local_data
  Item Load (0.2ms)  SELECT  "items".* FROM "items"  WHERE "items"."company_id" = ? AND "items"."name" = 'item0' LIMIT 1  [["company_id", 1]]
   (0.1ms)  begin transaction
   (0.0ms)  commit transaction
  Item Load (0.1ms)  SELECT  "items".* FROM "items"  WHERE "items"."company_id" = ? AND "items"."name" = 'item1' LIMIT 1  [["company_id", 1]]
   (0.0ms)  begin transaction
   (0.0ms)  commit transaction
  Item Load (0.1ms)  SELECT  "items".* FROM "items"  WHERE "items"."company_id" = ? AND "items"."name" = 'item2' LIMIT 1  [["company_id", 1]]
   (0.1ms)  begin transaction
   (0.0ms)  commit transaction
  Item Load (0.1ms)  SELECT  "items".* FROM "items"  WHERE "items"."company_id" = ? AND "items"."name" = 'item3' LIMIT 1  [["company_id", 1]]
   (0.0ms)  begin transaction
   (0.0ms)  commit transaction
  Item Load (0.1ms)  SELECT  "items".* FROM "items"  WHERE "items"."company_id" = ? AND "items"."name" = 'item4' LIMIT 1  [["company_id", 1]]
   (0.0ms)  begin transaction
   (0.0ms)  commit transaction
  Item Load (0.1ms)  SELECT  "items".* FROM "items"  WHERE "items"."company_id" = ? AND "items"."name" = 'item5' LIMIT 1  [["company_id", 1]]
   (0.0ms)  begin transaction
   (0.0ms)  commit transaction
  Item Load (0.1ms)  SELECT  "items".* FROM "items"  WHERE "items"."company_id" = ? AND "items"."name" = 'item6' LIMIT 1  [["company_id", 1]]
   (0.0ms)  begin transaction
   (0.0ms)  commit transaction
  Item Load (0.1ms)  SELECT  "items".* FROM "items"  WHERE "items"."company_id" = ? AND "items"."name" = 'item7' LIMIT 1  [["company_id", 1]]
   (0.0ms)  begin transaction
   (0.0ms)  commit transaction
  Item Load (0.1ms)  SELECT  "items".* FROM "items"  WHERE "items"."company_id" = ? AND "items"."name" = 'item8' LIMIT 1  [["company_id", 1]]
   (0.0ms)  begin transaction
   (0.0ms)  commit transaction
  Item Load (0.1ms)  SELECT  "items".* FROM "items"  WHERE "items"."company_id" = ? AND "items"."name" = 'item9' LIMIT 1  [["company_id", 1]]
   (0.0ms)  begin transaction
   (0.0ms)  commit transaction
=> [{:name=>"item0"}, {:name=>"item1"}, {:name=>"item2"}, {:name=>"item3"}, {:name=>"item4"}, {:name=>"item5"}, {:name=>"item6"}, {:name=>"item7"}, {:name=>"item8"}, {:name=>"item9"}]
ironsand
  • 14,329
  • 17
  • 83
  • 176

2 Answers2

2

You said you want to speed up the find_or_initialize_by step.

def update_items( items ) 
  # 'items' is an array of attributes hashes

  ActiveRecord::Base.transaction do

    names_array = items.map{ |attributes| attributes[:name] }
    existing_records = Company.where(name: names_array)

    records_by_name = existing_record.each_with_object({}) do |record, hash|
      name = record.name
      hash[name] = record
    end

    items.each do |attributes|
      name = attributes[:name]
      record = records_by_name[name] || Company.new

      # with validations and callbacks:
      #   record.update_attributes(attributes)

      # without validations:
      #   attributes.each{ |k, v| record[k] = v }
      #   record.save(validate: false)


      # without validations or callbacks:
      #   If you're using an older version of Rails,
      #   you can use record.save(:update_without_callbacks)
      #   For recent versions, you'll need to either write SQL-
      #   or disable all callbacks with skip_callbacks and then re-enable-
      #   them with set_callbacks
    end
  end
end

Basically, you find all the existing records in one go rather than executing individual search queries for each name.

SHS
  • 7,651
  • 3
  • 18
  • 28
0
  1. You can use update_all, which will return the number of entries updated. If 0 entries were updated, then you create the new record

    def update_items(item_array)
      item_array.each do |value|
        entries_updated = items.where(name: value[:name]).update_all(value)
        if entries_updated == 0
          items.create!(value)
        end
    end
    

    Notice that create! will raise an error if it can't create the record. You may wish to use just create and handle validation errors on your own.

  2. Another way you can go about it, based on the interface you suggested on chat items = load_all(item_array); items.update_all, is

    def update_items(item_array)
      grouped = item_array.group_by {|i| i[:name] }
      items.where(name: grouped.keys).each do |item|
        data = grouped[item.name]
        item.assign_attributes(data)
        item.save! if item.changed? 
      end
    end
    

    That would give you less queries if not all items are changed frequently, but can be slow if Company has thousands of items, but you could break item_array into smaller groups and then perform that. Notice that there's no way to produce one update statement that will change multiple records based on different criteria.

rafb3
  • 1,694
  • 12
  • 12
  • Thanks for your answer, but `name` column has unique value, unfortunately your code doesn't improve the performance in my case. – ironsand Dec 12 '14 at 00:40
  • Yes, name has a unique value. So what's gonna happen is that `Company.where.update_all` will always only update the one `Company` with that given name, meeting the line where you do `item.update_attributes(value)`. Then if nothing is updated, where it returns 0, means there's no company with that name, so it proceeds to create such company with the data in `value`. – rafb3 Dec 12 '14 at 01:27
  • That way, the code will always run 1 or 2 queries that will not require loading information from the db, because `update_all` doesn't load data from the db, just runs a `update`, which in your case is based on an index. Where your code will always run a select, and then load the data or not, and then run an update. – rafb3 Dec 12 '14 at 01:31
  • I noticed now, your code updating `Company` class while I want to update or initialize `Item` class. I couldn't adjust your answer to my original code. Am I missing something? – ironsand Dec 12 '14 at 01:50
  • ah true, you need to use `items`, updated the answer. All the conditions I mentioned still apply though. – rafb3 Dec 12 '14 at 02:19
  • oh, I get it. The problem is the `update` does access database to get `item` every time even if the local data file are not changed. – ironsand Dec 12 '14 at 02:55
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/66730/discussion-between-rafb3-and-tetsu). – rafb3 Dec 12 '14 at 10:07