1

I'm been searching high and low to find a way to safely run delete_all within a model's method and not incur a web server timeout (say 30 seconds).

Let's say the scenario is as follows: I may have 50K Item records and 150k related ItemHistory records. Clearly using destroy_all (which loads an instance of each record and sends individual deletes) isn't optimal. How would you approach this problem? I also have delayed_job and tried using the .delay method but don't believe it's a good fit for this issue. So, I started looking at Threads, but I would like to employ them safely.

Scenario #1 destroying hundreds of thousands of expired records

Thread.new do
  Item.expired.find_each do |item|
    item.destroy_all # this will also destroy ItemHistory records
  end
  ActiveRecord::Base.connection.close
end

Similarly to the previous question, what are there pitfalls of using a transaction within a Thread (I imagine this is supported)?

Scenario #2 - using transaction within a Thread

Thread.new do
    ActiveRecord::Base.transaction do   
      User.import(account_id)
      Item.import(account_id)
    end
    ActiveRecord::Base.connection.close
end

Are there any gotchas I need to consider?

user1322092
  • 4,020
  • 7
  • 35
  • 52
  • 3
    This **is** a task for a background worker (DJ/resque/sidekiq, etc.) – Sergio Tulentsev Sep 23 '14 at 05:58
  • Also, unless you're on JRuby or Rubinius, use of threads is not worth it. – Sergio Tulentsev Sep 23 '14 at 06:00
  • Ok, ok...I get it - background job! :-)... After you and Ryan weighed in, I scratched my head and said, what I am missing with delayed job? I tried using their `.delay` method in `item.destroy_all`, but the handler field in the delayed_job table stored an inordinate amount of text. So I re-read the Delayed_Job doc, and it dawned on me I needed a "Custom Job" method. So, I'll update my question to mention using `.delay`. Ryan or Sergio, can you mention in your response to create a custom job and wrap within `def perform..Item.expired.destroy_all...end`? Then I'll mark it as the answer? – user1322092 Sep 23 '14 at 22:58

2 Answers2

2

This is best solved by moving the deletion to a background worker. I would recommend looking at Sidekiq.

Ryan Bigg
  • 106,965
  • 23
  • 235
  • 261
  • Thanks Ryan! My issues looks like I was trying to use the `.delay` method on `item.delay.destroy_all` vs wrapping in a custom job. Could you update your answer accordingly... for posterity.. and I'll flag your comment as the answer? Thanks! – user1322092 Sep 23 '14 at 23:02
1

You tried naive delaying, as in

Item.expired.delay.destroy_all

This is not good, because it will fetch all those items and serialize them in the job body. That's a huge amount of text.

What you need to do instead is a specialized job. Something like this:

class PruneExpiredItems
  def perform
    Item.expired.destroy_all
  end
end

Delayed::Job.enqueue PruneExpiredItems.new
# or
PruneExpiredItems.new.delay.perform

Also,

do you need destroy_all? That is, do you rely on it calling your callbacks (cascade delete and whatnot)? If not, you could try delete_all which just sends DELETE FROM command to the db. Much more efficient.

Sergio Tulentsev
  • 226,338
  • 43
  • 373
  • 367
  • Thank you Sergio! I 'believe' I need to use `destroy_all` because of dependent ItemHistory records. Accordingly to http://apidock.com/rails/ActiveRecord/Base/delete_all/class, "Be careful with relations though, in particular :dependent rules defined on associations are not honored." But perhaps I'm missing something. Yet in http://stackoverflow.com/a/2797382/1322092 `delete_all ` mentions destroying dependent objects. So, it's not entirely clear. Thoughts? – user1322092 Sep 24 '14 at 11:11
  • @user1322092: that answer is worded ambiguously. `delete_all` does not delete/destroy associations, `destroy_all` does. You may be able, however, to enforce this on DB level (so called CASCADE DELETE), so that when a parent row is deleted, the db cleans up now-orphaned children rows. Choice of which method to use is a subject to many factors (such as performance and DB capabilities) – Sergio Tulentsev Sep 24 '14 at 11:23
  • In short, if performance is acceptable with `destroy_all`, use that. – Sergio Tulentsev Sep 24 '14 at 11:24