Why aggregate functions can not be used with DISTINCT ON(...)?

Question

The question was:

How to get row which was selected by aggregate function?

The question was answered and partially resolve my problem. But I still can not replace GROUP BY with DISTINCT ON because of next reason:

I need both:

Select id of aggregated row (may be resolved with DISTINCT ON)
Sum the ratio column (may be resolved with GROUP BY)

Some amount of resource is consumed by user. One part of day 10h user consumed 8 another part of day 10h user consumed 3 and 4h he do not consume resource. The task is to bill consumed resource by the maximum and do not bill when resource was not consumed

 id | name | amount | ratio 
----+------+--------+-------
  1 | a    |      8 |    10
  2 | a    |      3 |    10

I accomplish this task by next query:

SELECT 
    (
       SELECT id FROM t2 
       WHERE id = ANY ( ARRAY_AGG( tf.id ) ) AND amount = MAX( tf.amount ) 
    ) id,
    name, 
    MAX(amount) ma,
    SUM( ratio )
FROM t2  tf
GROUP BY name

Why it is not allowed to use aggregation functions with DISTINCT ON?

select distinct on ( name ) id, name, amount, sum( ratio )
from t2
order by name, amount desc

Or even simpler:

select distinct on ( name ) id, name, max(amount), sum( ratio )
from t2

This will resolve also issues with ORDER BY. No need a workaround with subquery

Is there technical reasons which do not allow query from the last example to work as described?

UPD
In theory this can work like next:

First example:

select distinct on ( name ) id, name, amount, sum( ratio )
from t2
order by name, amount desc

When the first distinct row found, it saves its id and name

Next time when second and next non distinct row is found it will call to sum and accumulate ratio

Second example:

select distinct on ( name ) id, name, max(amount), sum( ratio )
from t2

When the first distinct row found, it saves its id and name, accumulate ratio and set current value of ratio as maximum

Next time when second and next non distinct row is found it will call to sum and accumulate ratio

If any of second and/or next non distinct row has greater value for ratio column it is saved as maximum and saved value for id is updated

UPD
if more than one row where amount = max(amount) Postgres can return value from either row. As this is done for any field which is not under DISTINCT ON

To be sure which of is returned the query maybe qualified by ORDER BY clause. Like this is done here

To replace `GROUP BY` by `DISTINCT`: https://stackoverflow.com/a/67167595/4632019 — Eugen Konkov, May 13 '21 at 13:29

S-Man · Accepted Answer · 2018-09-28T18:17:48.190

3

I am not quiet sure if I understood your problem completely (I didn't understand the part with the "10h user").

But I believe you are searching for window functions. I expanded my fiddle from the other question a little bit and added your SUM(ratio) with help of such a window function.

Is this what you expected?

demo: db<>fiddle

SELECT DISTINCT ON (name)
    id, 
    name, 
    amount,
    SUM(ratio) OVER (PARTITION BY name)
FROM test
ORDER BY name, amount DESC

Of course you can calculate the MAX(amount) with the same window function as well:

SELECT 
    id, 
    name, 
    max_amount, 
    sum_ratio 
FROM (
    SELECT 
        t.*,
        MAX(amount) OVER w as max_amount,
        SUM(ratio) OVER w as sum_ratio
    FROM test t
    WINDOW w as (PARTITION BY name)
    ORDER BY name
) s 
WHERE amount = max_amount

No need of GROUP BY. OK, but you are needing an extra subquery in this case where you have to filter the result of the window function (amount = max_amount)

edited Sep 28 '18 at 18:17

answered Sep 28 '18 at 13:30

S-Man

22,521
7
40
63

Great! Thank you. This resolve my problem, but this is not the answer to this question: Why – Eugen Konkov Sep 28 '18 at 14:04
Why: Because in your examples you are not telling what is your grouping frame. Maybe you could imagine a case where you want to group (sum, avg,max, whatever) by column X but need column Y as distinct. So it is necessary to give the grouping column. – S-Man Sep 28 '18 at 14:13
huh... This query does not select correct `id`. It selects any row `id` from current frame but not exact that which has maximum `amount` – Eugen Konkov Sep 28 '18 at 15:57
Why: for this simple case grouping frame is same as distinct on – Eugen Konkov Sep 28 '18 at 16:00
@EugenKonkov I'm so sorry you're right of course. I missed a thing. If you do a window function you get the right result of the calcalutions but you need to filter the right row anyway. I fixed the fiddle and edited the query. Thanks for advicing. – S-Man Sep 28 '18 at 18:19
Is `ORDER BY` clause required? – Eugen Konkov Sep 28 '18 at 19:27
1

@EugenKonkov No not really. It is just for visualization (ok, forgot to delete it after edit, it's late right now ;) ). – S-Man Sep 28 '18 at 19:32

score 0 · Answer 2 · answered Sep 28 '18 at 16:21

Answering to my question:

Is there technical reasons which do not allow query from the last example to work as described?

We must consider how we'd extract the id if there's more than one row where amount = max(amount)

 id | name | amount | ratio 
----+------+--------+-------
  1 | a    |      8 |    10
  2 | a    |      8 |    10

For this data the query above will generate error:

ERROR:  more than one row returned by a subquery used as an expression

Why aggregate functions can not be used with DISTINCT ON(...)?

2 Answers2

Linked