DB design
While you can work with separate date
and time
columns, there is really no advantage over a single timestamp
column. I would adapt:
ALTER TABLE tbl ADD column ts timestamp;
UPDATE tbl SET ts = date + time; -- assuming actual date and time types
ALTER TABLE tbl DROP column date, DROP column time;
If date and time are not actual date
and time
data types, use to_timestamp()
. Related:
Query
Then the query is a bit simpler:
SELECT *
FROM (
SELECT sn, generate_series(min(ts), max(ts), interval '5 min') AS ts
FROM tbl
WHERE sn = '4as11111111'
AND ts >= '2018-01-01'
AND ts < '2018-01-02'
GROUP BY 1
) grid
CROSS JOIN LATERAL (
SELECT round(avg(vin1), 2) AS vin1_av
, round(avg(vin2), 2) AS vin2_av
, round(avg(vin3), 2) AS vin3_av
FROM tbl
WHERE sn = grid.sn
AND ts >= grid.ts
AND ts < grid.ts + interval '5 min'
) avg;
db<>fiddle here
Generate a grid of start times in the first subquery grid
, running from the first to the last qualifying row in the given time frame.
Join to rows that fall in each partition with a LATERAL
join and immediately aggregate averages in the subquery avg
. Due to the aggregates, it always returns a row even if no entries are found. Averages default to NULL
in this case.
The result includes all time slots between the first and last qualifying row in the given time frame. Various other result compositions would make sense, too. Like including all times slots in the given time frame or just time slots with actual values. All possible, I had to pick one interpretation.
Index
At least have this multicolumn index:
CRATE INDEX foo_idx ON tbl (sn, ts);
Or on (sn, ts, vin1, vin2, vin3)
to allow index-only scans - if some preconditions are met and especially if table rows are much wider than in the demo.
Closely related:
Based on your original table
As requested and clarified in the comment, and later updated again in the question to include the columns mac
and loc
. I assume you want separate averages per (mac, loc)
.
date
and time
are still separate columns, vin* columns are type float
, and exclude time slots without rows:
The updated query also moves the set-returning function generate_series()
to the FROM
list, which is cleaner before Postgres 10:
SELECT t.mac, sn.sn, t.loc, ts.ts::time AS time, ts.ts::date AS date
, t.vin1_av, t.vin2_av, t.vin3_av
FROM (SELECT text '4as11111111') sn(sn) -- provide sn here once
CROSS JOIN LATERAL (
SELECT min(date+time) AS min_ts, max(date+time) AS max_ts
FROM tbl
WHERE sn = sn.sn
AND date+time >= '2018-01-01 0:0' -- provide time frame here
AND date+time < '2018-01-02 0:0'
) grid
CROSS JOIN LATERAL generate_series(min_ts, max_ts, interval '5 min') ts(ts)
CROSS JOIN LATERAL (
SELECT mac, loc
, round(avg(vin1)::numeric, 2) AS vin1_av -- cast to numeric for round()
, round(avg(vin2)::numeric, 2) AS vin2_av -- but rounding is optional
, round(avg(vin3)::numeric, 2) AS vin3_av
FROM tbl
WHERE sn = sn.sn
AND date+time >= ts.ts
AND date+time < ts.ts + interval '5 min'
GROUP BY mac, loc
HAVING count(*) > 0 -- exclude empty slots
) t;
Create a multicolumn expression index to support this:
CRATE INDEX bar_idx ON tbl (sn, (date+time));
db<>fiddle here
But I would much rather use timestamp
all along.