2

I have a postgres table "Generation" with half-hourly timestamps spanning 2009 - present with energy data:

[1]: https://i.stack.imgur.com/7cVDu.png

I need to aggregate (average) the data across different intervals from specific timepoints, for example data from 2021-01-07T00:00:00.000Z for one year at 7 day intervals, or 3 months at 1 day interval or 7 days at 1h interval etc. date_trunc() partly solves this, but rounds the weeks to the nearest monday e.g.

SELECT date_trunc('week', "DATETIME") AS week, 
count(*), 
AVG("GAS") AS gas,
AVG("COAL") AS coal
FROM "Generation"
WHERE "DATETIME" >= '2021-01-07T00:00:00.000Z' AND "DATETIME" <= '2022-01-06T23:59:59.999Z'
GROUP BY week
ORDER BY week ASC
;

returns the first time series interval as 2021-01-04 with an incorrect count:

week                    count  gas                 coal
"2021-01-04 00:00:00"   192    18291.34375         2321.4427083333335
"2021-01-11 00:00:00"   336    14477.407738095239  2027.547619047619
"2021-01-18 00:00:00"   336    13947.044642857143  1152.047619047619

****EDIT: the following will return the correct weekly intervals by checking the start date relative to the nearest monday / start of week, and adjusts the results accordingly:

WITH vars1 AS (
    SELECT '2021-01-07T00:00:00.000Z'::timestamp  as start_time,
    '2021-01-28T00:00:00.000Z'::timestamp  as end_time
),
vars2 AS (
   SELECT 
      ((select start_time from vars1)::date - (date_trunc('week', (select start_time from vars1)::timestamp))::date) as diff
)

SELECT date_trunc('week', "DATETIME" - ((select diff from vars2) || ' day')::interval)::date + ((select diff from vars2) || ' day')::interval AS week, 
count(*), 
AVG("GAS") AS gas,
AVG("COAL") AS coal
FROM "Generation"
WHERE "DATETIME" >= (select start_time from vars1) AND "DATETIME" < (select end_time from vars1) 
GROUP BY week
ORDER BY week ASC

returns..

week                    count gas                   coal
"2021-01-07 00:00:00"   336   17242.752976190477    2293.8541666666665
"2021-01-14 00:00:00"   336   13481.497023809523    1483.0565476190477
"2021-01-21 00:00:00"   336   15278.854166666666    1592.7916666666667

And then for any daily or hourly (swap out day with hour) intervals you can use the following:

SELECT date_trunc('day', "DATETIME") AS day, 
count(*), 
AVG("GAS") AS gas,
AVG("COAL") AS coal
FROM "Generation"
WHERE "DATETIME" >= '2022-01-07T00:00:00.000Z' AND "DATETIME" < '2022-01-10T23:59:59.999Z'
GROUP BY day
ORDER BY day ASC
;
Dan Thory
  • 237
  • 3
  • 14
  • "returns the first time series interval as 2021-01-04 with ..." Can you share more information on why the count is incorrect? – Luuk Feb 14 '22 at 11:28
  • A weekly count should be 336, but as the first time series is treated as 2021-01-04 to 2021-01-10 it discounts half the week as the WHERE clause only counts data from 2021-01-07 – Dan Thory Feb 14 '22 at 11:35
  • You can use: `select date_trunc('week', cast('2021-01-07 00:00:00' as timestamp));`, see this answer: https://stackoverflow.com/a/11762058/724039 (the output shows the start of that week) – Luuk Feb 14 '22 at 11:41
  • Do you mean to get the start of the week relative to the required start date for the week, and then tweak the date_trunc function to adjust it - in this case by 3 days like this ``` SELECT date_trunc('week', "DATETIME" - '3 day'::interval)::date + '3 day'::interval AS week, ``` – Dan Thory Feb 14 '22 at 12:06

2 Answers2

1

In order to select the complete week, you should change the WHERe-clause to something like:

WHERE "DATETIME" >= date_trunc('week','2021-01-07T00:00:00.000Z'::timestamp) 
  AND "DATETIME" < (date_trunc('week','2022-01-06T23:59:59.999Z'::timestamp) + interval '7' day)::date

This will effectively get the records from January 4,2021 until (and including ) January 9,2022

Note: I changed <= to < to stop the end-date being included!

EDIT:

when you want your weeks to start on January 7, you can always group by:

(date_part('day',(d-'2021-01-07'))::int-(date_part('day',(d-'2021-01-07'))::int % 7))/7

(where d is the column containing the datetime-value.)

see: dbfiddle

EDIT:

This will get the list from a given date, and a specified interval. see DBFIFFLE

WITH vars AS (
   SELECT 
      '2021-01-07T00:00:00.000Z'::timestamp AS qstart,
      '2022-01-06T23:59:59.999Z'::timestamp AS qend,
      7 as qint,
      INTERVAL '1 DAY' as qinterval
)
SELECT 
   (select date(qstart) FROM vars) + (SELECT qinterval from vars) * ((date_part('day',("DATETIME"-(select date(qstart) FROM vars)))::int-(date_part('day',("DATETIME"-(select date(qstart) FROM vars)))::int % (SELECT qint FROM vars)))::int) AS week,
   count(*), 
   AVG("GAS") AS gas,
   AVG("COAL") AS coal
FROM "Generation"
WHERE "DATETIME" >= (SELECT qstart FROM vars) AND "DATETIME" <= (SELECT qend FROM vars)
GROUP BY week
ORDER BY week
;

I added the WITH vars to do the variable stuff on top and no need to mess with the rest of the query. (Idea borrowed here)

I only tested with qint=7,qinterval='1 DAY' and qint=14,qinterval='1 DAY' (but others values should work too...)

Luuk
  • 12,245
  • 5
  • 22
  • 33
  • Thanks Luuk, to clarify though I actually want the results from 2021-01-07 and not starting from January 4, 2021 So if my start date is a thursday (like in this case) the time series will run thursday - wednesday for each 7 day interval. I want to avoid rounding to a monday each time. SELECT date_trunc('week', "DATETIME" - '3 day'::interval)::date + '3 day'::interval AS week, forces the change I want but to use this I'd need to check the day of the week each time and add the '3 day' in as a variable, which I guess is fine as a woraround, unless theres a better way? – Dan Thory Feb 14 '22 at 12:26
  • see edit in my answer to give a way solve it. – Luuk Feb 14 '22 at 12:47
  • Thanks @Luuk - I can see that generates the correct time series / intervals but how do I join this to my table, sorry my sql is very basic!! – Dan Thory Feb 14 '22 at 14:27
  • I did do another edit. – Luuk Feb 14 '22 at 20:00
  • This is great, thanks so much Luuk it works perfectly for the 1d or 7d intervals. Could the variables be adjusted to work for 1h - i.e. where each count is just 2 timepoints, i tried this but couldnt get it to work? – Dan Thory Feb 15 '22 at 08:38
  • I do not have sample data to test that. You might want to invest some time in understanding how this can be solved (by reading my answer, and other answers), which might raise your skills in solving this too. Succes! – Luuk Feb 15 '22 at 08:48
  • Okay fair point! Appreciate your help. – Dan Thory Feb 17 '22 at 11:42
0

Using the function EXTRACT you may calculate the difference in days, weeks and hours between your timestamp ts and the start_date as follows

Difference in Days

extract (day from ts - start_date) 

Difference in Weeks

Is the difference in day divided by 7 and truncated

trunc(extract (day from ts - start_date)/7) 

Difference in Hours

Is the difference in day times 24 + the difference in hours of the day

extract (day from ts - start_date)*24 + extract (hour from ts - start_date)

The difference can be used in GROUP BY directly. E.g. for week grouping the first group is difference 0, i.e. same week, the next group with difference 1, the next week, etc.

Sample Example

I'm using a CTE for the start date to avoid multpile copies of the paramater

with start_time as
(select DATE'2021-01-07' as start_ts),
prep as (
select 
  ts,
  extract (day from ts - (select start_ts from start_time)) day_diff,
  trunc(extract (day from ts - (select start_ts from start_time))/7) week_diff,
  extract (day from ts - (select start_ts from start_time)) *24 + extract (hour from ts - (select start_ts from start_time)) hour_diff,
  value
from test_table
where ts >= (select start_ts from start_time)
)
select week_diff, avg(value)
from prep
group by week_diff order by 1
Marmite Bomber
  • 19,886
  • 4
  • 26
  • 53