0

I don't know much at all about SQL, I've just toyed with it here and there through the years but never really 'used' it.

I'm trying to get a list of prices / volumes and aggregate them:

CREATE TABLE IF NOT EXISTS test (
    ts timestamp without time zone NOT NULL,
    price decimal NOT NULL,
    volume decimal NOT NULL
);

what I'd like is to extract:

  • min price
  • max price
  • sum volume
  • sum (price * volume) / sum (volume)

By 1h slices

If I forget about the last line for now, I have:

SELECT MIN(price) min_price, MAX(price) max_price, SUM(volume) sum_vol, date_trunc('hour', ts) ts_group FROM test
GROUP BY ts_group;

My understanding is that 'GROUP BY ts_group' will calculate ts_group, build groups of rows and then apply the MIN / MAX / SUM functions after. Since the syntax doesn't make any sense to me (entries on the select line would be treated differently while being declared together vs. building groups and then declaring operations on the groups), I could be dramatically wrong here.

But that will not return the min_price, max_price and sum_vol results after the grouping; I get ts, price and volume in the results.

If I remove the GROUP BY line to try to see all the output, I get the error:

column "test.ts" must appear in the GROUP BY clause or be used in an aggregate function

Which I don't really understand either...

I looked at: must appear in the GROUP BY clause or be used in an aggregate function but I don't really get it and I looked at the doc (https://www.postgresqltutorial.com/postgresql-group-by/) which shows working example, but doesn't really clarify what is wrong with what I'm trying to do here.

While I'd be happy to have a working solution, I'm more looking from an explanation, or pointers toward good resources, that would allow me to understand this.


I have this working solution:

SELECT MIN(price) min_price, MAX(price) max_price, SUM(volume) sum_vol, (SUM(price * volume)/SUM(volume)) vwap FROM test
GROUP BY date_trunc('hour', ts);

but I still don't understand the error message from my question

Zoe
  • 27,060
  • 21
  • 118
  • 148
Thomas
  • 10,933
  • 14
  • 65
  • 136
  • 1
    I think you need to review tutorials on SQL, whether in text or video. The `select` specifies the expressions that are returned. The `group by` (when specified) returns one row per group. It seems pretty intuitive once you are used to it. – Gordon Linoff Jul 29 '21 at 20:24
  • I understand the concept, but I think I'm failing with the syntax. I just realized for example that the group by can have an expression that is not listed in the select part (because it was the case in all examples I had read), so it sheds some light right away. – Thomas Jul 29 '21 at 20:31
  • You cannot "get ts, price and volume in the results" for that query, because you requested at least `MIN(price) min_price`. Please, check [the documentation](https://www.postgresql.org/docs/12/tutorial-agg.html). Not aggregated expressions in the select list should be functions of aggregation expressions or expressions in the `group by`, i.e. have one value for group – astentx Jul 29 '21 at 20:37

2 Answers2

0

All of your expressions in SQL must use data elements and functions that are known to PostgreSQL. In your first example, ts_group is neither an element of your table, nor a defined function, so it complained that it did not know how to calculate it.

Your second example works because date_trunc is a known function and ts is defined as a data element of the test table.

It also gets you the correct grouping (by hour intervals) because date_trunc 'blurs' all of those unique timestamps that by themselves would not combine into groups.

Chris Maurer
  • 2,339
  • 1
  • 9
  • 8
0

Without a GROUP BY, then having any aggregates in your select list means it will aggregate everything down to just one row. But how does it aggregate date_trunc('hour', ts) down to one row, since there is no aggregating function specified for it? If you were using MySQL, it would just pick some arbitrary value for the column from all the seen values and report that as the "aggregate". But PostgreSQL is not so cavalier with your data. If your query is vague in this way, it refuses to run it. If you just want to see some value from the set without caring which one it is, you could use min or max to aggregate it.

Since the syntax doesn't make any sense to me (entries on the select line would be treated differently while being declared together vs. building groups and then declaring operations on the groups),

You are trying to understand SQL as if it were C. But it is very different. Just learn it for what it is, without trying to force it to be something else. The select list is where you define the columns you want to see in the output. They may be computed in different ways, but what they have in common is that you want each of them to show up in the output, so they are listed together in that spot.

jjanes
  • 37,812
  • 5
  • 27
  • 34