So the general rule of thumb with building these kinds of queries is to minimize work in "Ruby-land" and maximize work in "Database-land". In your solution above, you're fetching a set of markings with any tags in the set array
, which presumably will be a very large set (all posts that have any of those tags). This is represented in a ruby array and processed (group_by
is in Ruby-world, group
is the equivalent in Database-land).
So aside from being hard-to-read, that solution is going to be slow for any large set of markings.
There are a couple ways to solve the problem without doing any heavy lifting in Ruby-world. One way is using subqueries, like this:
scope :with_tag_ids, ->(tag_ids) {
tag_ids.map { |tag_id|
joins(:markings).where(markings: { tag_id: tag_id })
}.reduce(all) { |scope, subquery| scope.where(id: subquery) }
}
This generates a query like this (again for tag_ids 5 and 8)
SELECT "posts".*
FROM "posts"
WHERE "posts"."id" IN (SELECT "posts"."id" FROM "posts" INNER JOIN "markings" ON "markings"."post_id" = "posts"."id" WHERE "markings"."tag_id" = 5)
AND "posts"."id" IN (SELECT "posts"."id" FROM "posts" INNER JOIN "markings" ON "markings"."post_id" = "posts"."id" WHERE "markings"."tag_id" = 8)
Note that since everything here is calculated directly in SQL, no arrays are generated or processed in Ruby. This will generally scale much better.
Alternatively, you can use COUNT
and do it in a single query without subqueries:
scope :with_tag_ids, ->(tag_ids) {
joins(:markings).where(markings: { tag_id: tag_ids }).
group(:post_id).having('COUNT(posts.id) = ?', tag_ids.count)
}
Which generates SQL like this:
SELECT "posts".*
FROM "posts"
INNER JOIN "markings" ON "markings"."post_id" = "posts"."id"
WHERE "markings"."tag_id" IN (5, 8)
GROUP BY "post_id"
HAVING (COUNT(posts.id) = 2)
This assumes that you don't have multiple markings with the same pair of tag_id
and post_id
, which would throw off the count.
I would imagine that the last solution is probably the most efficient, but you should try different solutions and see what works best for your data.
See also: Query intersection with activerecord