I can't tell you the source of the memory leak, but I do spy some low-hanging fruit.
But first, two things:
Are you sure that ActiveRecord is the right way to copy data from one database to another? I'm very confident that it's not. Every major database product has robust export and import capabilities, and the performance you'll see there will be many, many times better than doing it in Ruby, and you can always invoke those tools from within your app. Think hard about that before you continue down this path.
Where does the number 10,000 come from? Your code suggests that you know it's not a good idea to fetch all of the records at once, but 10,000 is still a lot of records. You may see some gains by simply trying different numbers: 100 or 1,000, say.
That said, let's dig into what this line is doing:
users = User.limit(10000).offset(offset).as_json
The first part, User.limit(10000).offset(offset)
creates an ActiveRecord::Relation object representing your query. When you call as_json
on it, the query is executed, which instantiates 10,000 User model objects and puts them in an array, and then a Hash is constructed from each of those User objects' attributes. (Take a look at the source for ActiveRecord::Relation#as_json
here.)
In other words, you're instantiating 10,000 User objects only to throw them away after you've got their attributes.
So, a quick win is to skip that part entirely. Just select the raw data:
user_keys = User.attribute_names
until break_condition
# ...
users_values = User.limit(10000).offset(offset).pluck(user_keys)
users_values.each do |vals|
user_attrs = user_keys.zip(vals).to_h
member = Member.find_by(name: user_attrs["name"])
member.update_attributes(user_attrs)
end
end
ActiveRecord::Calculations#pluck
returns an array of arrays with the values from each record. Inside the user_values.each
loop we turn that values array into a Hash. No need to instantiate any User objects.
Now let's take a look at this:
member = Member.find_by(name: user_attrs["name"])
member.update_attributes(user_attrs)
This selects a record from the database, instantiates a Member object, and then updates the record in the database—10,000 times in every iteration of the while
loop. This is the correct approach if you need validations to run when that record is updated. If you don't need validations to run, though, you can save time and memory by, again, not instantiating any objects:
Member.where(name: user_attrs["name"]).update_all(user_attrs)
The difference is that ActiveRecord::Relation#update_all
doesn't select the record from the database or instantiate a Member object, it just updates it. You said in your comment above that you have a unique constraint on the name
column, so we know that this will update only a single record.
Having made those changes, you must still contend with the fact that you have to do 10,000 UPDATE queries in each iteration of the while
loop. Again, consider using your databases' built-in export and import functionality instead of trying to make Rails do this.