22

I found that my Model.create! statements were taking a very long time to run when I added a large number of records at once. Looked at ActiveRecord-Import but it didn't work with an array of hashes (which is what I have and which I think is pretty common). How can I improve the performance?

Jack R-G
  • 1,778
  • 2
  • 19
  • 25
  • See also: [How to implement bulk insert in Rails 3](http://stackoverflow.com/questions/8505263/how-to-implement-bulk-insert-in-rails-3) and [Batch insertion in rails 3](http://stackoverflow.com/questions/15784305/batch-insertion-in-rails-3). –  Apr 13 '14 at 05:05
  • @Cupcake Those two references talk about using ActiveRecord-Import (which doesn't support an array of hashes like create! does) and writing SQL directly. See the accepted answer for what I did. – Jack R-G Apr 14 '14 at 17:46
  • this gem https://github.com/bjhaid/active_record_bulk_insert aims to solve this kind of challenge – bjhaid May 31 '14 at 12:31
  • may be it help https://cbabhusal.wordpress.com/2015/01/03/updating-multiple-records-at-the-same-time-rails-activerecord/ – Shiva Aug 18 '15 at 02:44
  • 1
    @illusionist That article addresses a very specific batch update: when all records are to be updated with the same value. For multiple record update with multiple values, the best approach is similar to that taken for bulk insert (see my answer on http://stackoverflow.com/questions/18999563/rails-faster-way-to-perform-updates-on-many-records/27081266#27081266) – Jack R-G Aug 19 '15 at 03:26

7 Answers7

24

Use the activerecord-import gem. Let us say you are reading a CSV file and generating a Product catalogue and you want to insert records in batches of 1000:

batch,batch_size = [], 1_000 
CSV.foreach("/data/new_products.csv", :headers => true) do |row|
  batch << Product.new(row)

  if batch.size >= batch_size
    Product.import batch
    batch = []
  end
end
Product.import batch
Harish Shetty
  • 64,083
  • 21
  • 152
  • 198
  • 3
    I did mention activerecord-import in my answer, but it does not address my specific case, which is an array of hashes (I believe this to be a very typical use case; it certainly was for me). Bottom line: if ar-import had support for an array of hashes I would have used it and not written my own code. I put this out there as another alternative. – Jack R-G Mar 11 '13 at 02:54
  • 2
    I missed your mention of `activerecord-import`. The example I have given deals with array of hashes(The csv row is a hash). If you already have the array of hashes you can process them using the technique used above. – Harish Shetty Mar 11 '13 at 06:30
  • 2
    Looking into this in the active record docs and found that there's actually built in functionality to support that in the [create method](http://railsapi.com/doc/rails-v2.3.8/classes/ActiveRecord/Base.html#M000969)   `# Creating an Array of new objects using a block, where the block is executed for each object: User.create([{ :first_name => 'Jamie' }, { :first_name => 'Jeremy' }]) do |u|     u.is_admin = false   end` – Máté Jul 31 '13 at 11:22
  • 2
    @matov That's true but inefficient. It generates one SQL Insert for each record. Try doing that with 10000 records -- it's rather slow. – Jack R-G Apr 14 '14 at 17:38
  • @JackR-G yeah, I'm aware. there are gems to support that sort of functionality, like activerecord-import you mentioned but I found that not good enough when you deal with larger data sets. – Máté May 15 '14 at 11:37
12

Thanks to Chris Heald @cheald for his 2009 article, with showed me that the best way to go was the multi-row insert command.

Added the following code to my initializers/active_record.rb file, changed my Model.create!(...) calls to Model.import!(...) and away it goes. A couple caveats:

1) It does not validate the data.
2) It uses the form of the SQL INSERT command that reads like ...

INSERT INTO <table> (field-1, field-2, ...) 
       VALUES (value-1-1, value-1-2, ...), (value-2-1, value-2-2, ...), ...`

... which may not be the correct syntax for all databases, but it works with Postgres. It would not be difficult to alter the code for the appropriate syntax for your SQL version.

In my particular case, inserting 19K+ records into a simple table on my development machine (MacBook Pro with 8GB RAM, 2.4GHz Intel Core i5 and and SSD) went from 223 seconds using 'model.create!' to 7.2 seconds using a 'model.import!'.

class ActiveRecord::Base

  def self.import!(record_list)
    raise ArgumentError "record_list not an Array of Hashes" unless record_list.is_a?(Array) && record_list.all? {|rec| rec.is_a? Hash }
    key_list, value_list = convert_record_list(record_list)        
    sql = "INSERT INTO #{self.table_name} (#{key_list.join(", ")}) VALUES #{value_list.map {|rec| "(#{rec.join(", ")})" }.join(" ,")}"
    self.connection.insert_sql(sql)
  end

  def self.convert_record_list(record_list)
    key_list = record_list.map(&:keys).flatten.uniq.sort

    value_list = record_list.map do |rec|
      list = []
      key_list.each {|key| list <<  ActiveRecord::Base.connection.quote(rec[key]) }
      list
    end

    return [key_list, value_list]
  end
end
JJD
  • 50,076
  • 60
  • 203
  • 339
Jack R-G
  • 1,778
  • 2
  • 19
  • 25
5

I started running into problems with large numbers of records (> 10000), so I modified the code to work in groups of 1000 records at a time. Here is a link to the new code:

https://gist.github.com/jackrg/76ade1724bd816292e4e

Jack R-G
  • 1,778
  • 2
  • 19
  • 25
4

For Rails 6.x use insert_all.

mattes
  • 8,936
  • 5
  • 48
  • 73
1

You can also use the activerecord-insert_many gem. Just make an array of objects!

events = [{name: "Movie Night", time: "10:00"}, {name: "Tutoring", time: "7:00"}, ...]

Event.insert_many(events)
siegy22
  • 4,295
  • 3
  • 25
  • 43
Luke
  • 2,053
  • 1
  • 18
  • 25
0

Using a transaction speeds up bulk inserts a lot!

Model.transaction do
    many.times{ Model.create! }
end

If multiple Models are involved, do a Model.transaction for each model, which is affected:

Model1.transaction do
    Model2.transaction do
        many.times do
            m1 = Model1.create!
            m1.add_model2
        end
    end
end
tvw
  • 316
  • 3
  • 4
  • 2
    I create a 1000 records in a transaction, it's still iterative and takes 30 seconds. I don't think this is a good solution. – max pleaner Feb 25 '17 at 00:55
  • It is. But it is only one part of the solution. To improve performance much more, you can use plain SQL. If you commit each record rather than using transactions, inserts are much slower. – tvw Apr 20 '17 at 11:18
0

https://stackoverflow.com/a/15318202/9732392 This answer is well explained, but in my opinion if we use array instead of Product.new(row) like below it might be more faster

batch,batch_size = [], 1000 
CSV.foreach("/data/new_products.csv", :headers => true) do |row|
  batch << [row[:part_number], row[:item_name], row[:cost]]

  if batch.size >= batch_size
    product_columns = [:part_number, :item_name, :cost]
    Product.import product_columns, batch, on_duplicate_key_update: {conflict_target: [:id], columns: [:part_number, :item_name, :cost]}
    batch = []
  end
end
Product.import product_columns, batch, on_duplicate_key_update: {conflict_target: [:id], columns: [:part_number, :item_name, :cost]} if batch.present?

more info read? https://github.com/zdennis/activerecord-import