I have an ActiveRecord query that uses the OR operator to chain together 2 queries. The results come back fine, but the speed of executing the combined query is ~10 times as slow as executing either of the 2 queries in on their own.
We have an Event
model and an Invitation
model. A User
can be invited to an Event
by being targeted through an invitation filter, or by being individually invited by having an Invitation
record.
So when determining how many users are invited to a particular event, we have to look at all those with Invitations
and all those matching the filter. We do that here:
@invited_count = @invited_by_individual.or(@invited_by_filter).distinct.count(:id)
It's important to note, both @invited_by_individual
and @invited_by_filter
relations have references
and includes
statements within them.
Now, the problem is that when we execute that query, it takes about 1200ms. If we were to do the queries individually, each of them only take about 80ms. So @invited_by_filter.distinct.count
and @invited_by_individual.distinct.count
both return results in about 80ms, but neither of these is complete on its own.
Is there any way to speed up the query with the OR operator? Why is this happening in the first place?
Here is the SQL generated by the ActiveRecord queries:
Fast, single query:
(79.7ms)
SELECT COUNT(DISTINCT "users"."id")
FROM "users"
LEFT OUTER JOIN "invitations"
ON "invitations"."user_id" = "users"."id"
WHERE "invitations"."event_id" = $1 [["event_id", 732]]
Slow, with combined query:
(1220.7ms)
SELECT COUNT(DISTINCT "users"."id")
FROM "users"
LEFT OUTER JOIN "invitations"
ON "invitations"."user_id" = "users"."id"
WHERE ("invitations"."event_id" = $1 OR "users"."organization_id" = $2) [["event_id", 732], ["organization_id", 13]]
Update, here's the EXPLAIN:
(1418.2ms) SELECT COUNT(DISTINCT "users"."id") FROM "users" LEFT OUTER JOIN "invitations" ON "invitations"."user_id" = "users"."id" WHERE ("users"."root_organization_id" = $1 OR "invitations"."event_id" = $2) [["root_organization_id", -1], ["event_id", 749]]
=>
EXPLAIN for: SELECT COUNT(DISTINCT "users"."id") FROM "users" LEFT OUTER JOIN "invitations" ON "invitations"."user_id" = "users"."id" WHERE ("users"."root_organization_id" = $1 OR "invitations"."event_id" = $2) [["root_organization_id", -1], ["event_id", 749]]
#=> QUERY PLAN
Aggregate (cost=121781.56..121781.57 rows=1 width=8)
-> Hash Right Join (cost=113248.88..121778.64 rows=1165 width=8)
Hash Cond: (invitations.user_id = users.id)
Filter: ((users.root_organization_id = '-1'::integer) OR (invitations.event_id = 749))
-> Seq Scan on invitations (cost=0.00..1299.70 rows=63470 width=8)
-> Hash (cost=93513.28..93513.28 rows=1135328 width=12)
-> Seq Scan on users (cost=0.00..93513.28 rows=1135328 width=12)
(7 rows)
Update 2, EXPLAIN for queries ran individually, does use the indices:
(91.5ms) SELECT COUNT(*) FROM "users" INNER JOIN "invitations" ON "invitations"."user_id" = "users"."id" WHERE "users"."root_organization_id" = $1 [["root_organization_id", -1]]
=>
EXPLAIN for: SELECT COUNT(*) FROM "users" INNER JOIN "invitations" ON "invitations"."user_id" = "users"."id" WHERE "users"."root_organization_id" = $1 [["root_organization_id", -1]]
#=> QUERY PLAN
Aggregate (cost=19.05..19.06 rows=1 width=8)
-> Nested Loop (cost=0.72..19.05 rows=1 width=0)
-> Index Scan using index_users_on_root_organization_id on users (cost=0.43..4.45 rows=1 width=8)
Index Cond: (root_organization_id = '-1'::integer)
-> Index Only Scan using index_invitations_on_user_id on invitations (cost=0.29..14.57 rows=3 width=4)
Index Cond: (user_id = users.id)
(6 rows)
and
EXPLAIN for: SELECT COUNT(DISTINCT "users"."id") FROM "users" LEFT OUTER JOIN "invitations" ON "invitations"."user_id" = "users"."id" WHERE "invitations"."event_id" = $1 [["event_id", 749]]
#=> QUERY PLAN
Aggregate (cost=536.34..536.35 rows=1 width=8)
-> Nested Loop (cost=0.72..536.19 rows=62 width=8)
-> Index Scan using index_invitations_on_event_id on invitations (cost=0.29..11.98 rows=62 width=4)
Index Cond: (event_id = 749)
-> Index Only Scan using users_pkey on users (cost=0.43..8.45 rows=1 width=8)
Index Cond: (id = invitations.user_id)
(6 rows)