0

I have a migration which updates existing records with a new attribute value. The model is called 'MyRecord'. It has millions of records in the database with a new unit_id column of null. I want to update that unit_id column with a specific value:

MyRecord.find_each do |record|
  unit_id = Unit.calculate_unit_from_old_columns(record.legacy_column_1, record.legacy_column_2).first.id
  record.update unit_id: unit_id
end

This creates a lot of N+1 queries:

SELECT units.* FROM units WHERE units.item_1 = 'Electronics' AND units.item_2 = 'Auto' 
UPDATE my_records SET unit_id='43' WHERE legacy_column_1 = 'Legacy Electronics' AND legacy_column_2 = 'Legacy Auto';

And some of these N+1 queries are duplicated. I see a lot of this in logs:

SELECT units.* FROM units WHERE units.item_1 = 'Electronics' AND units.item_2 = 'Auto' 
SELECT units.* FROM units WHERE units.item_1 = 'Electronics' AND units.item_2 = 'Auto' 

I am familiar with eager loading via includes. But when this migration is run to update existing data, there will be no association yet. So I cannot do this:

record.includes(:unit)

How can I eliminate the N+1 queries and cache the query so it does not hit database again when s duplicate query?

Daniel Viglione
  • 8,014
  • 9
  • 67
  • 101
  • How complicated is `calculate_unit_from_old_columns`? It'd be best to move everything to the updating query (`SET unit_id = DO_STUFF(legacy_column_1, legacy_column_2)`. If it's too complicated, you should prepare batch updates (map 1000 records to `(id, new_unit_id)` pairs and use those for update queries. – Marcin Kołodziej Nov 14 '18 at 22:34
  • @MarcinKołodziej can you give me an example of batch updates or point me to a link? – Daniel Viglione Nov 14 '18 at 22:37
  • calculate_unit_from_old_columns is a named scope: scope : calculate_unit_from_old_columns, ->(item1, item2) { where(item_1: item1, item2: item_2 ) } – Daniel Viglione Nov 14 '18 at 22:38
  • Oh, right, it's quite a long line. Well, you can write a simple update with join. You tagged your question with MySQL and PostgreSQL and they have different syntax for that, it should be easily searchable. – Marcin Kołodziej Nov 14 '18 at 22:44
  • @MarcinKołodziej I am using MySQL – Daniel Viglione Nov 14 '18 at 22:45
  • @MarcinKołodziej I do not see how an update with join works here. The whole point of the migration is to create a relation between two tables that currently do not have a relation. – Daniel Viglione Nov 14 '18 at 22:50
  • What is an "N+1 query"? – Gordon Linoff Nov 15 '18 at 00:32
  • @GordonLinoff https://stackoverflow.com/questions/97197/what-is-the-n1-select-query-issue – Daniel Viglione Nov 15 '18 at 00:35

1 Answers1

0

Use a simple query, you can think of batching it if it runs for too long:

MyRecord.connection.execute(
  "UPDATE my_records, units 
   SET unit_id = units.id 
   WHERE units.item_1 = legacy_column_1 AND units.item_2 = legacy_column_2"
)
Marcin Kołodziej
  • 5,253
  • 1
  • 10
  • 17