To explore performance questions, it's useful to create a test bed capable of exposing differences. The following commands establish an environment with a sufficient number of rows to demonstrate the relative performance of different queries (adjust the parameters in p
to model different data characteristics):
CREATE TABLE users (
id integer PRIMARY KEY,
username text,
trader_id integer);
CREATE TABLE addresses (
id integer PRIMARY KEY,
street text,
trader_id integer);
WITH p(num_users, num_traders, mean_trader_addresses) AS (
VALUES (20000, 10000, 100)
),
new_users AS (
INSERT INTO users (id, username, trader_id)
SELECT s.n, gen_random_uuid()::text, (random() * p.num_traders)::integer + 1
FROM p
CROSS JOIN LATERAL generate_series(1, p.num_users) s(n))
INSERT INTO addresses(id, street, trader_id)
SELECT s.n, gen_random_uuid()::text, (random() * p.num_traders)::integer % p.num_traders + 1
FROM p
CROSS JOIN LATERAL generate_series(1, p.num_traders * p.mean_trader_addresses) s(n);
CREATE INDEX users_by_trader_id ON users(trader_id);
CREATE INDEX addresses_by_trader_id ON addresses(trader_id);
Several queries, each of which returns the OP's desired results, are listed below along with their mean completion times when run on my laptop.
As Erwin Brandstetter pointed out, the following performs poorly when querying for one user because the subquery has to process all addresses
even though only those associated with a single trader are of interest. This query performs better than the others when querying for all users.
SELECT users.*, address.id, address.street, address.trader_id
FROM users
LEFT JOIN (SELECT addresses.id, addresses.street, addresses.trader_id,
rank() OVER (PARTITION BY trader_id ORDER BY addresses.id DESC) AS rn
FROM addresses) address
ON users.trader_id = address.trader_id
AND address.rn = 1
WHERE users.id = 42;
3923.033 ms (3654.100 ms for all users)
This next version of the query performs significantly better than the prior one when querying for a single user; however, its performance when querying for all users is demonstrably worse.
EXPLAIN (ANALYZE, buffers, VERBOSE)
SELECT DISTINCT ON (users.id) users.*, addresses.*
FROM users
LEFT JOIN addresses
ON users.id = addresses.trader_id
WHERE users.id = 42
ORDER BY users.id, addresses.id DESC;
0.361 ms (5072.365 ms for all users)
This third version's performance is statistically identical to the previous one for single user queries and is only slightly worse than the first version's when querying all users.
SELECT users.*, address.*
FROM users
LEFT JOIN LATERAL (SELECT DISTINCT ON (addresses.trader_id) addresses.id, addresses.street, addresses.trader_id
FROM addresses
WHERE addresses.trader_id = users.id
ORDER BY addresses.trader_id, addresses.id DESC) address
ON TRUE
WHERE users.id = 42;
0.356 ms (3963.354 ms for all users)
Additional insite might be gained with a deeper analysis of the output from EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
.