Let's start with the most obvious approach first:
type_a_task_ids = [1,2,3,1,2,3]
type_b_task_ids = [1,2,2,3,3]
type_a_tasks = type_a_task_ids.map { |task_id| Task.includes(:project).find(task_id) }
type_b_tasks = type_b_task_ids.map { |task_id| Task.includes(:project).find(task_id) }
The above is simple, readable but potentially slow: it will perform one database round-trip for each distinct task_id
as well as one database round-trip for each distinct project_id
in the given tasks. All the latency adds up, so you want to load the tasks (and corresponding projects) in bulk.
It would be great if you could have Rails bulk-load (prefetch) and cache those same records upfront in, say, two round-trips (one for all distinct tasks and one for all distinct associated projects), and then just have the exact same code as above -- except find
would always hit the cache instead of the database.
Unfortunately things don't quite work that way (by default) in Rails, as ActiveRecord
uses a query cache. Running Task.find(1)
(SELECT * FROM tasks WHERE id=1
) after Task.find([1,2,3])
(SELECT * FROM tasks WHERE id IN (1,2,3)
) will not leverage the query cache since the first query is different from the second. (Running Task.find(1)
a second, third etc. time will leverage the query cache, though, as Rails will see the exact same SELECT
query fly by multiple times and return the cached result sets.)
Enter IdentityMap
caching. Identity Map Caching is different in the sense that it caches records, not queries, on a per-table-and-primary-key basis. Thus, running Task.find([1,2,3])
would fill out three records in the Identity Map Cache for table tasks
(the entries with IDs 1
, 2
and 3
respectively), and a subsequent Task.find(1)
would promptly return the cached record for table tasks
and ID 1
.
# with IdentityMap turned on (see IdentityMap documentation)
# prefetch all distinct tasks and their associated projects
# throw away the result, we only want to prep the cache
Task.includes(:project).find(type_a_task_ids & type_b_task_ids)
# proceed with regular logic
type_a_task_ids = [1,2,3,1,2,3]
type_b_task_ids = [1,2,2,3,3]
type_a_tasks = type_a_task_ids.map { |task_id| Task.includes(:project).find(task_id) }
type_b_tasks = type_b_task_ids.map { |task_id| Task.includes(:project).find(task_id) }
However, IdentityMap
has never been active by default (for good reason), and was ultimately removed from Rails.
How do you achieve the same result without IdentityMap
? Simple:
# prefetch all distinct tasks and their associated projects
# store the result in our own identity cache
my_tasks_identity_map = \
Hash[Task.includes(:project).find(type_a_task_ids & type_b_task_ids).map { |task|
[ task.id, task ]
}]
# proceed with cache-centric logic
type_a_task_ids = [1,2,3,1,2,3]
type_b_task_ids = [1,2,2,3,3]
type_a_tasks = type_a_task_ids.map { |task_id| my_tasks_identity_map[task_id] }
type_b_tasks = type_b_task_ids.map { |task_id| my_tasks_identity_map[task_id] }