I have recently started consulting and helping with the development of a Rails application that was using MongoDB (with Mongoid as its DB client) to store all its model instances.
This was fine while the application was in an early startup stage, but as the application got more and more clients and while starting to need more and more complicated queries to show proper statistics and other information in the interface, we decided that the only viable solution forward was to normalize the data, and move to a structured database instead.
So, we are now in the process of migrating both the tables and the data from MongoDB (with Mongoid as object-mapper) to Postgres (with ActiveRecord as object-mapper). Because we have to make sure that there is no improper non-normalized data in the Mongo database, we have to run these data migrations inside Rails-land, to make sure that validations, callbacks and sanity checks are being run.
All went 'fine' on development, but now we are running the migration on a staging server, with the real production database. It turns out that for some migrations, the memory usage of the server increases linearly with the number of model instances, causing the migration to be killed once we've filled 16 GB of RAM (and another 16GB of swap...).
Since we migrate the model instances one by one, we hope to be able to find a way to make sure that the memory usage can remain (near) constant.
The things that currently come to mind that might cause this are (a) ActiveRecord or Mongoid keeping references to object instances we have already imported, and (b) the migration is run in a single DB transaction, so Postgres is taking more and more memory until it is completed maybe?
So my question:
- What is the probable cause of this linear memory usage?
- How can we reduce it?
- Are there ways to make Mongoid and/or ActiveRecord relinquish old references?
- Should we attempt to call the Ruby GC manually?
- Are there ways to split a data migration into multiple DB transactions, and would that help?
These data migrations have about the following format:
class MigrateSomeThing < ActiveRecord::Migration[5.2]
def up
Mongodb::ModelName.all.each do |old_thing| # Mongoid's #.all.each works with batches, see https://stackoverflow.com/questions/7041224/finding-mongodb-records-in-batches-using-mongoid-ruby-adapter
create_thing(old_thing, Postgres::ModelName.new)
end
raise "Not all rows could be imported" if MongoDB::ModelName.count != Postgres::ModelName.count
end
def down
Postgres::ModelName.delete_all
end
def create_thing(old_thing, new_thing)
attrs = old_thing.attributes
# ... maybe alter the attributes slightly to fit Postgres depending on the thing.
new_thing.attributes = attrs
new_thing.save!
end
end