right now I am grouping data by the minute:
SELECT
date_trunc('minute', ts) ts,
...
FROM binance_trades
GROUP BY date_trunc('minute', ts), instrument
ORDER BY ts
but I would like to group by 5 seconds.
I found this question: Postgresql SQL GROUP BY time interval with arbitrary accuracy (down to milli seconds)
with the answer:
SELECT date_bin(
INTERVAL '5 minutes',
measured_at,
TIMSTAMPTZ '2000-01-01'
),
sum(val)
FROM measurements
GROUP BY 1;
I am using v13.3, so date_bin doesn't exist on my version of Postgres.
I don't quite understand the other answers in that question as well.
Output of EXPLAIN:
Limit (cost=6379566.36..6388410.30 rows=13 width=48) (actual time=76238.072..76239.529 rows=13 loops=1)
-> GroupAggregate (cost=6379566.36..6388410.30 rows=13 width=48) (actual time=76238.071..76239.526 rows=13 loops=1)
Group Key: j.win_start
-> Sort (cost=6379566.36..6380086.58 rows=208088 width=28) (actual time=76238.000..76238.427 rows=5335 loops=1)
Sort Key: j.win_start
Sort Method: quicksort Memory: 609kB
-> Nested Loop (cost=1000.00..6356204.58 rows=208088 width=28) (actual time=23971.722..76237.055 rows=5335 loops=1)
Join Filter: (j.ts_win @> b.ts)
Rows Removed by Join Filter: 208736185
-> Seq Scan on binance_trades b (cost=0.00..3026558.81 rows=16006783 width=28) (actual time=0.033..30328.674 rows=16057040 loops=1)
Filter: ((instrument)::text = ''ETHUSDT''::text)
Rows Removed by Filter: 126872903
-> Materialize (cost=1000.00..208323.11 rows=13 width=30) (actual time=0.000..0.001 rows=13 loops=16057040)
-> Gather (cost=1000.00..208323.05 rows=13 width=30) (actual time=3459.850..3461.076 rows=13 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on scalper_5s_intervals j (cost=0.00..207321.75 rows=5 width=30) (actual time=2336.484..3458.397 rows=4 loops=3)
Filter: ((win_start >= ''2021-08-20 00:00:00''::timestamp without time zone) AND (win_start <= ''2021-08-20 00:01:00''::timestamp without time zone))
Rows Removed by Filter: 5080316
Planning Time: 0.169 ms
Execution Time: 76239.667 ms