For few rows (like 2 or 3 or 5 on avg., depends) in B
and C
per row in A
, DISTINCT ON
is typically fastest.
For many rows per row in A
, there are (much) more efficient solutions. And your information: "B and C are huge" indicates as much.
I suggest LATERAL
subqueries with ORDER BY
and LIMIT 1
, backed by a matching index.
SELECT A.pk AS pk_a, B.*, C.*
FROM A
LEFT JOIN LATERAL (
SELECT *
FROM B
WHERE B.fk_a = A.pk -- lateral reference
ORDER BY B.id DESC
LIMIT 1
) B ON true
LEFT JOIN LATERAL (
SELECT *
FROM C
WHERE C.fk_a = A.pk -- lateral reference
ORDER BY C.id DESC
LIMIT 1
) C ON true;
Assuming B.id
and C.id
are NOT NULL
.
You need at least indexes on the FK columns. Ideally, multi-column indexes on B (fk_a, id DESC)
and C (fk_a, id DESC)
though.
Use LEFT JOIN
! to not exclude rows from A
that are not referenced in either B
or C
. It would be an evil trap to use [INNER] JOIN
here, since you join to two unrelated tables.
Detailed explanation:
Related:
Simpler syntax with smart naming convention
The result of above query has pk_a
once and fk_a
twice. Useless ballast - and the same column name twice may be an actual problem, depending on your client.
You can spell out a column list in the outer SELECT
(instead of the syntax shortcut A.*, B.*
) to avoid redundancies. You may have to do that either way if there are more duplicate names or if you don't want all columns.
But with a smart naming convention, the USING
clause can fold the redundant PK and FK columns for you:
SELECT *
FROM A
LEFT JOIN LATERAL (
SELECT * FROM B
WHERE B.a_id = A.a_id
ORDER BY B.id DESC
LIMIT 1
) B USING (a_id)
LEFT JOIN LATERAL (
SELECT * FROM C
WHERE C.a_id = A.a_id
ORDER BY C.id DESC
LIMIT 1
) C USING (a_id);
Logically, USING (a_id)
is redundant here, since WHERE B.a_id = A.a_id
in the subquery already filters the same way. But the additional effect of USING
is that joining columns are folded to one instance. So only one a_id
remains in the result. The manual:
Furthermore, the output of JOIN USING
suppresses redundant columns:
there is no need to print both of the matched columns, since they must
have equal values. While JOIN ON
produces all columns from T1
followed
by all columns from T2
, JOIN USING
produces one output column for each
of the listed column pairs (in the listed order), followed by any
remaining columns from T1
, followed by any remaining columns from T2
.
It also typically makes a lot of sense to use the same name for the same data. So: a_id
for PK and FK columns.