How to retrieve other columns when performing an aggregate function?

Question

I've been trying to retrieve other columns from a table in which I'm performing an aggregate function to get the minimum number by date, this is an example of the data:

id   resource   date               quality   ask   ask_volume
1    1          2020-06-08 10:50   0         6.9   5102
2    1          2020-06-08 10:50   1         6.8   2943
3    1          2020-06-08 10:50   2         6.9   25338
4    1          2020-06-08 10:50   3         7.0   69720
5    1          2020-06-08 10:50   4         7.0   9778
6    1          2020-06-08 10:50   5         7.0   297435
7    1          2020-06-08 10:40   0         6.6   611
8    1          2020-06-08 10:40   1         6.6   4331
9    1          2020-06-08 10:40   2         6.7   1000
10   1          2020-06-08 10:40   3         7.0   69720
11   1          2020-06-08 10:40   4         7.0   9778
12   1          2020-06-08 10:40   5         7.0   297435
...

This is the desired result I'm trying to get, so I can perform a weighted average on it:

date               ask   ask_volume
2020-06-08 10:50   6.8   2943
2020-06-08 10:40   6.6   4331
...

Though both quality 0 and quality 1 have the same ask, quality 1 shall be chosen because its ask_volume is higher.

I have tried the classic:

SELECT date, min(ask) FROM table GROUP BY date;

But adding ask_volume to the column list will force me to add it to the GROUP BY as well, messing up the result.

The problems are:

How can I get the corresponding ask_volume of the minimum ask displayed in the result?
And, if there are two records with the same ask value on the same date, how can I get ask_volume to show the one with the highest value?

I use PostgreSQL, but SQL from a different database will help me get the idea as well.

score 0 · Accepted Answer · answered Jun 08 '20 at 15:34

In standard SQL, you would use window functions:

select *
from (
    select t.*, row_number() over(partition by date order by ask, ask_volume desc) rn
    from mytable 
) t
where rn = 1

In Postgres this is better suited for distinct on:

select distinct on (date) *
from mytable
order by ask, ask_volume desc

score 0 · Answer 2 · answered Jun 08 '20 at 15:34

0

You can do what you want with distinct on:

select distinct on (date) t.*
from (select t.*,
order by date, ask, ask_volume desc;

I find your date column confusing. It has a time component, so the name is misleading.

answered Jun 08 '20 at 15:34

Gordon Linoff

1,242,037
58
646
786

score 0 · Answer 3 · answered Jun 08 '20 at 15:57

Other answers are simpler and better, but here is an alternative to get around your aggregation problem. You could use a subquery to only include max ask_volume per date per ask before you get the min ask per date.

select date, min(ask), max(ask_volume) 
from t
where (date, ask_volume) in (select date, max(ask_volume) 
                             from t
                             group by date, ask)
group by date;

score 0 · Answer 4 · answered Jun 08 '20 at 21:48

DISTINCT ON has already been suggested, but in imperfect ways. (The currently accepted answer is incorrect.) That's how you do it:

SELECT DISTINCT ON (date) *
FROM   tbl
ORDER  BY date, ask, ask_volume DESC NULLS LAST;

Most importantly, leading expressions in ORDER BY must be in the set of expressions in DISTINCT ON. In other words for the simple case, date must be the first ORDER BY expression.

While null values have not been ruled out (with a NOT NULL constraint), you must add NULLS LAST or get null values first in descending order.

Detailed explanation:

Select first row in each GROUP BY group?

How to retrieve other columns when performing an aggregate function?

4 Answers4