I am testing against the Sakila database, see http://www.postgresqltutorial.com/postgresql-sample-database/. This database holds three relations:
- film: film_id, title
- actor: actor_id, first_name
- film_actor: film_id, actor_id
I want to list all films and for each film, I want to to list all actors playing in that particular film. I ended with the following query:
select film_id, title, array
(
select first_name
from actor
inner join film_actor
on actor.actor_id = film_actor.actor_id
where film_actor.film_id = film.film_id
) as actors
from film
order by title;
Conceptually, this is a 1 + n query:
one query: get films
n queries: for each film f
f.actors = array(get actors playing in f)
I always understood that 1 + n queries should be avoided at all cost, as this does not scale well.
So this made me wondering: how does PostgreSQL implement this internally? Let's say we have 1000 movies, does it internally execute 1000 select actor.first_name from actor inner join ...
queries? Or is PostgreSQL smarter about this and does it something like the following?
1. one query: get films
2. one query: get actors related to these films while keeping reference to film_id
3. internally: for each film f
f.actors = array(subset of (2) according to film_id)
This does 1 + 1 queries.