The cheapest and simplest DISTINCT
operation is ... not to multiply rows in a "proxy cross join" in the first place. Aggregate first, then join. See:
Best for returning few selected rows
Assuming you actually don't want to retrieve the whole table, but just one or few selected photos at a time, with aggregated details. Then LATERAL
subqueries are fast and elegant:
SELECT *
FROM photo p
CROSS JOIN LATERAL (
SELECT json_agg(c) AS comments
FROM comment c
WHERE photo_id = p.photo_id
) c1
CROSS JOIN LATERAL (
SELECT json_agg(t) AS tags
FROM photo_tag pt
JOIN tag t USING (tag_id)
WHERE pt.photo_id = p.photo_id
) t
WHERE p.photo_id = 2; -- arbitrary selection
This returns whole rows from comment
and tag
, aggregated into JSON arrays separately. Rows are not multiplies like in your attempt, but they are only as "distinct" as they are in your base tables.
To additionally fold duplicates in the base data, see below.
Notes:
LATERAL
and json_agg()
require Postgres 9.3 or later.
json_agg(c)
is short for json_agg(c.*)
.
We do not need to LEFT JOIN
because an aggregate function like json_agg()
always returns a row.
Typically, you'd only want a subset of columns - at least excluding the redundant photo_id
:
SELECT *
FROM photo p
CROSS JOIN LATERAL (
SELECT json_agg(json_build_object('comment_id', comment_id
, 'comment', comment)) AS comments
FROM comment
WHERE photo_id = p.photo_id
) c
CROSS JOIN LATERAL (
SELECT json_agg(t) AS tags
FROM photo_tag pt
JOIN tag t USING (tag_id)
WHERE pt.photo_id = p.photo_id
) t
WHERE p.photo_id = 2;
json_build_object()
was introduced with Postgres 9.4. Used to be cumbersome in older versions because a ROW
constructor doesn't preserve column names. But there are generic workarounds:
Also allows to choose JSON key names freely, you don't have to stick to column names.
Best for returning all or most rows
SELECT p.*
, COALESCE(c1.comments, '[]') AS comments
, COALESCE(t.tags, '[]') AS tags
FROM photo p
LEFT JOIN (
SELECT photo_id
, json_agg(json_build_object('comment_id', comment_id
, 'comment', comment)) AS comments
FROM comment c
GROUP BY 1
) c1 USING (photo_id)
LEFT JOIN LATERAL (
SELECT photo_id , json_agg(t) AS tags
FROM photo_tag pt
JOIN tag t USING (tag_id)
GROUP BY 1
) t USING (photo_id);
Once we retrieve enough rows, this gets cheaper than LATERAL
subqueries. Works for Postgres 9.3+.
Note the USING
clause in the join condition. This way we can conveniently use SELECT *
in the outer query without getting duplicate columns for photo_id
. I didn't use SELECT *
here because your deleted answer indicates you want empty JSON arrays instead of NULL for no tags / no comments.
Also remove existing duplicates in base tables
You can't just json_agg(DISTINCT json_build_object(...))
because there is no equality operator for the data type json
. See:
There are various better ways:
SELECT *
FROM photo p
CROSS JOIN LATERAL (
SELECT json_agg(to_json(c1.comment)) AS comments1
, json_agg(json_build_object('comment', c1.comment)) AS comments2
, json_agg(to_json(c1)) AS comments3
FROM (
SELECT DISTINCT c.comment -- folding dupes here
FROM comment c
WHERE c.photo_id = p.photo_id
-- ORDER BY comment -- any particular order?
) c1
) c2
CROSS JOIN LATERAL (
SELECT jsonb_agg(DISTINCT t) AS tags -- demonstrating jsonb_agg
FROM photo_tag pt
JOIN tag t USING (tag_id)
WHERE pt.photo_id = p.photo_id
) t
WHERE p.photo_id = 2;
Demonstrating 4 different techniques in comments1
, comments2
, comments3
(redundantly) and tags
.
db<>fiddle here
Old: sqlfiddle backpatched to Postgres 9.3; sqlfiddle for Postgres 9.6