1

I'm trying to generate rolling averages over set time periods grouped by an item ID column.

Here's the basic layout of the table and some dummy data, with the fluff stripped out:

----------------------------------------------------
| id   | itemid   | isup   | logged                |
----------------------------------------------------
| 1    | 1        | true   | 2017-03-23 12:55:00   |
| 2    | 1        | false  | 2017-03-23 12:57:00   |
| 3    | 1        | true   | 2017-03-23 13:07:00   |
| 4    | 1        | false  | 2017-03-23 13:09:00   |
| 5    | 1        | true   | 2017-03-23 13:50:00   |
| 6    | 2        | false  | 2017-03-23 12:55:00   |
| 7    | 2        | true   | 2017-03-23 14:00:00   |
| 8    | 2        | false  | 2017-03-23 14:03:00   |
----------------------------------------------------

I found an answer to a previous question on rolling averages but I can't quite seem to figure out how to group the averages by the item ID; pretty much all the avenues I've gone down have ended up with the statistics being flat out wrong.

Here's my starting point - I have a feeling that my lack of understanding of ROW_NUMBER() OVER isn't helping matters.

SELECT id, itemid, AVG(isup) 
    OVER (PARTITION BY groupnr ORDER BY logged) AS averagehour
FROM (       
    SELECT id, itemid, isup, logged, intervalgroup, 
        itemid - ROW_NUMBER() OVER (
            partition by intervalgroup ORDER BY logged) AS groupnr
    FROM (
        SELECT id, itemid, logged,
            CASE WHEN isup = TRUE THEN 1 ELSE 0 END AS isup,
           'epoch'::TIMESTAMP + '3600 seconds'::INTERVAL * 
                (EXTRACT(EPOCH FROM logged)::INT4 / 3600) AS intervalgroup
        FROM uplog
  ) alias_inner
) alias_outer
ORDER BY logged;

Any help would be much appreciated.

Community
  • 1
  • 1
GMemory
  • 137
  • 9
  • 4
    "Grouped rolling averages" is very general. Please describe what you are trying to do. Sample data and desired results help. – Gordon Linoff Mar 24 '17 at 13:41

2 Answers2

1

The linked answer contains almost everything you need. If you want to "group" further (f.ex. by itemid), you'll just need to add those "groups" to the PARTITION BY clauses of the window functions:

select   *, avg(isup::int) over (partition by itemid, group_nr order by logged) as rolling_avg
from     (
    select *, id - row_number() over (partition by itemid, interval_group order by logged) as group_nr
    from   (
        select *, 'epoch'::timestamp + '3600 seconds'::interval * (extract(epoch from logged)::int4 / 3600) as interval_group
        from   dummy
    ) t1
) t2
order by itemid, logged

Note however that this (and the linked answer) works only because id doesn't have gaps & is in order with the timestamp field of its table. If that's not the case, you'll need

row_number() over (partition by itemid order by logged) - row_number() over (partition by itemid, interval_group order by logged) as group_nr

instead of id - row_number() ....

http://rextester.com/YBSC43615

Also, if you're going to use only hourly groups, you can use:

date_trunc('hour', logged) as interval_group

instead of the more general arithmetic (as @LaurenzAlbe already noticed).

pozs
  • 34,608
  • 5
  • 57
  • 63
0

My answer assumes

  1. that logged is a timestamp with time zone, which is the only reasonable data type for logging.

  2. that your complicated date arithmetic is suppose to calculate the value of logged at time zone UTC (else why would you use 'epoch'::timestampas a base?), rounded to the next lower hour.

  3. that you want to group by that rounded timestamp and itemid.

Here would be an answer:

SELECT *,
       avg(isup::integer)
          OVER (PARTITION BY itemid,
                             date_trunc('hour', logged AT TIME ZONE 'UTC')
               ) average,
       date_trunc('hour', logged AT TIME ZONE 'UTC') avg_interval
FROM uplog
ORDER BY logged;

┌────┬────────┬──────┬────────────────────────┬────────────────────────┬─────────────────────┐
│ id │ itemid │ isup │         logged         │        average         │    avg_interval     │
├────┼────────┼──────┼────────────────────────┼────────────────────────┼─────────────────────┤
│  6 │      2 │ f    │ 2017-03-23 12:55:00+01 │ 0.00000000000000000000 │ 2017-03-23 11:00:00 │
│  1 │      1 │ t    │ 2017-03-23 12:55:00+01 │ 0.50000000000000000000 │ 2017-03-23 11:00:00 │
│  2 │      1 │ f    │ 2017-03-23 12:57:00+01 │ 0.50000000000000000000 │ 2017-03-23 11:00:00 │
│  3 │      1 │ t    │ 2017-03-23 13:07:00+01 │ 0.66666666666666666667 │ 2017-03-23 12:00:00 │
│  4 │      1 │ f    │ 2017-03-23 13:09:00+01 │ 0.66666666666666666667 │ 2017-03-23 12:00:00 │
│  5 │      1 │ t    │ 2017-03-23 13:50:00+01 │ 0.66666666666666666667 │ 2017-03-23 12:00:00 │
│  7 │      1 │ t    │ 2017-03-23 14:00:00+01 │ 0.50000000000000000000 │ 2017-03-23 13:00:00 │
│  8 │      1 │ f    │ 2017-03-23 14:03:00+01 │ 0.50000000000000000000 │ 2017-03-23 13:00:00 │
└────┴────────┴──────┴────────────────────────┴────────────────────────┴─────────────────────┘
(8 rows)
Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263