I've got a small little ruby script that pours over 80,000 or so records.
The processor and memory load involved for each record is smaller than a smurf balls, but it still takes about 8 minutes to walk all the records.
I'd though to use threading, but when I gave it a go, my db ran out of connections. Sure it was when I attempted to connect 200 times, and really I could limit it better than that.. But when I'm pushing this code up to Heroku (where I have 20 connections for all workers to share), I don't want to chance blocking other processes because this one ramped up.
I have thought of refactoring the code so that it conjoins the all the SQL, but that is going to feel really really messy.
So I'm wondering is there a trick to letting the threads share connections? Given I don't expect the connection variable to change during processing, I am actually sort of surprised that the thread fork needs to create a new DB connection.
Well any help would be super cool (just like me).. thanks
SUPER CONTRIVED EXAMPLE
Below is a 100% contrived example. It does display the issue.
I am using ActiveRecord inside a very simple thread. It seems each thread is creating it's own connection to the database. I base that assumption on the warning message that follows.
START_TIME = Time.now
require 'rubygems'
require 'erb'
require "active_record"
@environment = 'development'
@dbconfig = YAML.load(ERB.new(File.read('config/database.yml')).result)
ActiveRecord::Base.establish_connection @dbconfig[@environment]
class Product < ActiveRecord::Base; end
ids = Product.pluck(:id)
p "after pluck #{Time.now.to_f - START_TIME.to_f}"
threads = [];
ids.each do |id|
threads << Thread.new {Product.where(:id => id).update_all(:product_status_id => 99); }
if(threads.size > 4)
threads.each(&:join)
threads = []
p "after thread join #{Time.now.to_f - START_TIME.to_f}"
end
end
p "#{Time.now.to_f - START_TIME.to_f}"
OUTPUT
"after pluck 0.6663269996643066"
DEPRECATION WARNING: Database connections will not be closed automatically, please close your
database connection at the end of the thread by calling `close` on your
connection. For example: ActiveRecord::Base.connection.close
. (called from mon_synchronize at /Users/davidrawk/.rvm/rubies/ruby-1.9.3-p448/lib/ruby/1.9.1/monitor.rb:211)
.....
"after thread join 5.7263710498809814" #THIS HAPPENS AFTER THE FIRST JOIN.
.....
"after thread join 10.743254899978638" #THIS HAPPENS AFTER THE SECOND JOIN