Aggregate function that returns any value for a group

Question

I have a large table, which I want to group by one column value and produce an aggregate of another column value. As an aggregate I don't care about the actual value as long as it's a value that appears in any of the rows of the grouped by column. Something like coalesce(), e.g. an aggregate that produces the first non-null value it receives in the input set.

Of course, coalesce() is not an aggregate function, and there actually is no aggregate function matching the behavior I need, in the docs:

https://www.postgresql.org/docs/9.4/static/functions-aggregate.html

What can I do to retrieve any element for each group in a group by query? I know I could use min() or max() but I'd rather avoid to compare all values to each other to identify the result. A solution that would prevent hitting any more pages for a group that already has a value would be ideal. It's a big table (several GB on disk) with large groups (hundreds of thousands rows).

I have seen there are recursive CTE and lateral joins. I am trying to wrap my head around these, to see if these might help...

Here's an example:

with t1(x) as (select * from generate_series(0, 10, 1)),
     t2(x, y) as (select * from t1, t1 t2)

select x
     , any_element(y) -- how can I simulate this any_element() aggregate function?
from t2
group by x
order by x

If you don't care about the value, then just use `min()` or `max()`. — , Mar 02 '17 at 13:09
@a_horse_with_no_name that would be my fallback, but I'm especially interested in not having to compare all values in the group to find the min or max. I have looked at the [first/last aggregates](https://wiki.postgresql.org/wiki/First/last_(aggregate)), but I'm not sure how they work. Do they enable shortcuts? Are shortcuts inherently impossible? — moooeeeep, Mar 02 '17 at 13:12
I don't think the comparison part of `min()` is what will slow you down. It's the scanning through all rows and creating the groups. Unless you have **a lot** of rows _per group_ (e.g. millions of rows for each group) finding the min (or max) will be the cheapest part of the whole step. — , Mar 02 '17 at 13:15
I don't know how the grouping is implemented. But a solution that would prevent hitting any more pages for a group that already has a value would be ideal. It's a big table with large groups. — moooeeeep, Mar 02 '17 at 13:19
Any advice if the type you're aggregating doesn't have a min/max (in this case UUID). I know I can define a min/max function... but really hoping there was a sample() or first/last to really just grab any of the values. — xbakesx, Feb 21 '21 at 21:25

score 12 · Accepted Answer · edited Jun 27 '23 at 22:44

12

distinct on will return any row:

with t1(x) as (select * from generate_series(0, 10, 1)),
     t2(x, y) as (select * from t1 a, t1 b)

select distinct on (x) x,y
from t2
where y is not null
order by x

Or just use min/max as suggested in the comments.

edited Jun 27 '23 at 22:44

user2297550

3,142
3
28
39

answered Mar 02 '17 at 13:18

Clodoaldo Neto

118,695
26
233
260

Aggregate function that returns any value for a group

1 Answers1

Linked