We are talking about the ordered-set aggregate function mode(), introduced with Postgres 9.4. You probably saw this error message:
ERROR: OVER is not supported for ordered-set aggregate mode
We can work around it. But which mode exactly?
(All assuming group_id
and type
are NOT NULL
, else you need to do more.)
Mode of qualifying rows
This computes the mode based on the filtered set (with type = 'customer'
) alone.
You get the most popular color per group among "customers".
A subquery in a plain JOIN
(without LEFT
and LATERAL
in this case) would do the job - calculating the mode once per group, not for every individual row:
SELECT u1.id, u1.group_id, u1.color, u2.mode_color
FROM users u1
JOIN ( -- not LATERAL
SELECT group_id, type -- propagate out for the join
, mode() WITHIN GROUP (ORDER BY color) AS mode_color
FROM users
WHERE type = 'customer' -- place condition in subquery (cheap)
GROUP BY group_id, type
) u2 USING (group_id, type); -- shorthand syntax for matching names
-- WHERE type = 'customer' -- or filter later (expensive)
To avoid repeating your condition, place it in the subquery and propagate it to the outer query in the join clause - I picked matching column names and joined with USING
in my example.
You can move the condition to the outer query or even to a later step, yet. It will be needlessly more expensive, though, as the mode for every combination of (group_id, type)
has to be calculated, before the results for every other type is excluded in a later step.
There are ways to parameterize your query. Prepared statements, PL/pgSQL function, see:
Or, if the underlying table does not change much, a materialized view with all pre-computed modes per (group_id, type)
replacing the subquery would be an option.
One more option: use a CTE to filter qualifying rows first, then the WHERE
condition can stay outside of the subquery like you requested:
WITH cte AS ( -- filter result rows first
SELECT id, group_id, color
FROM users u1
WHERE type = 'customer' -- predicate goes here
)
SELECT *
FROM cte u1
LEFT JOIN ( -- or JOIN, doesn't matter here
SELECT group_id
, mode() WITHIN GROUP (ORDER BY color) AS mode_color
FROM cte -- based on only qualifying rows
GROUP BY 1
) u2 USING (group_id);
We can simplify with SELECT *
since USING
conveniently places only one group_id
in the result set.
Mode of all rows
If you want to base the mode on all rows (including those where type = 'customer'
is not true), you need a different query.
You get the most popular color per group among all members.
Move the WHERE
clause to the outer query:
SELECT u1.id, u1.group_id, u1.color, u2.mode_color
FROM users u1
LEFT JOIN ( -- or JOIN, doesn't matter here
SELECT group_id
, mode() WITHIN GROUP (ORDER BY color) AS mode_color
FROM users
GROUP BY group_id
) u2 USING (group_id)
WHERE u1.type = 'customer';
If your predicate (type = 'customer'
) is selective enough, computing the mode for all groups may be a waste. Filter the small subset first and only compute the mode for contained groups. Add a CTE for this:
WITH cte AS ( -- filter result rows first
SELECT id, group_id, color
FROM users u1
WHERE type = 'customer'
)
SELECT *
FROM cte u1
LEFT JOIN ( -- or JOIN
SELECT group_id
, mode() WITHIN GROUP (ORDER BY color) AS mode_color
FROM (SELECT DISTINCT group_id FROM cte) g -- only relevant groups
JOIN users USING (group_id) -- but consider all rows for those
GROUP BY 1
) u2 USING (group_id);
Similar to the CTE query above, but based on all group members in the base table.