I am trying to make the query to get each year with a count of the number of movies in that year that had casts that all were not male (for each year, count the number of movies in that year that had no males).
These are the tables:
ACTOR (id, fname, lname, gender)
MOVIE (id, name, year)
CASTS (pid, mid, role) -- pid refers to actor id, mid refers to movie id
This is what I indexed (id
for the tables are primary keys, so they're already indexed, or so I assume):
CREATE INDEX gender_index on actor(gender);
CREATE INDEX movie_name_index on movie(name);
CREATE INDEX movie_year_index on movie(year);
CREATE INDEX casts_index on casts(pid, mid, role);
CREATE INDEX casts_pid_index on casts(pid);
CREATE INDEX casts_mid_index on casts(mid);
CREATE INDEX casts_role_index on casts(role);
This is my query:
SELECT m.year, count(m.id)
FROM movie as m
WHERE m.id NOT IN (
SELECT DISTINCT m.id
FROM movie as m, casts as c, actor as a
WHERE m.id = c.mid and a.id = c.pid and a.gender = 'M'
)
GROUP BY m.year
ORDER BY m.year
The query takes forever (and never completes), so how can I make this fast? Does using NOT EXISTS
help, although I thought the optimizer takes care of this? Do I need to index anything else? Is there another query that is better? I'm using PostgreSQL if this makes any difference.
Here is the EXPLAIN
:
"GroupAggregate (cost=1512539.61..171886457832.52 rows=61 width=8)"
" Group Key: m.year"
" -> Index Scan using movie_year_index on movie m (cost=1512539.61..171886453988.38 rows=768706 width=8)"
" Filter: (NOT (SubPlan 1))"
" SubPlan 1"
" -> Materialize (cost=1512539.18..1732298.66 rows=1537411 width=4)"
" -> Unique (cost=1512539.18..1718605.60 rows=1537411 width=4)"
" -> Merge Join (cost=1512539.18..1700559.32 rows=7218511 width=4)"
" Merge Cond: (m_1.id = c.mid)"
" -> Index Only Scan using movie_pkey on movie m_1 (cost=0.43..57863.94 rows=1537411 width=4)"
" -> Materialize (cost=1512531.37..1548623.92 rows=7218511 width=4)"
" -> Sort (cost=1512531.37..1530577.65 rows=7218511 width=4)"
" Sort Key: c.mid"
" -> Hash Join (cost=54546.59..492838.95 rows=7218511 width=4)"
" Hash Cond: (c.pid = a.id)"
" -> Seq Scan on casts c (cost=0.00..186246.43 rows=11445843 width=8)"
" -> Hash (cost=35248.91..35248.91 rows=1176214 width=4)"
" -> Seq Scan on actor a (cost=0.00..35248.91 rows=1176214 width=4)"
" Filter: ((gender)::text = 'M'::text)"