3

I'm trying to learn SQL so be patient with me. I'm using PostgreSQL 9.3

I want to average a column based on a window of dates. I've been able to write window functions that accomplish this with a set interval but I'd like to be able to be able to do this with a growing interval. By this I mean:

average values from date_0 to date_1
average values from date_0 to date_2
average values from date_0 to date_3
..... so date date_0 stays the same and date_x grows and creates a larger sample

I'm assuming there is a better way than running a query for each range I'd like to average. Any advice is appreciated. Thank you.

EDIT

I'm trying to create evenly spaced bins to be used to aggregate values of a table over.
I come to the interval by:

(MAX(date) - MIN(date)) / bins

where date is the column of a table
and bins is the number of bins I'd like to divide the table into.

date_0 = MIN(date)
date_n = MIN(date) + (interval * n)

Community
  • 1
  • 1
Ryder Brooks
  • 2,049
  • 2
  • 21
  • 29

2 Answers2

4

I suggest the handy function width_bucket():

To get the average for each time segment ("bin"):

SELECT width_bucket(extract(epoch FROM t.the_date)
                  , x.min_epoch, x.max_epoch, x.bins) AS bin
     , avg(value) AS bin_avg
FROM   tbl t
    , (SELECT extract(epoch FROM min(the_date)) AS min_epoch
            , extract(epoch FROM max(the_date)) AS max_epoch
            , 10 AS bins
       FROM   tbl t
      ) x
GROUP  BY 1;

To get the "running average" over the (step-by-step) growing time interval:

SELECT bin, round(sum(bin_sum) OVER w /sum(bin_ct) OVER w, 2) AS running_avg
FROM  (
   SELECT width_bucket(extract(epoch FROM t.the_date)
                     , x.min_epoch, x.max_epoch, x.bins) AS bin
        , sum(value) AS bin_sum
        , count(*)   AS bin_ct
   FROM   tbl t
       , (SELECT extract(epoch FROM min(the_date)) AS min_epoch
               , extract(epoch FROM max(the_date)) AS max_epoch
               , 10 AS bins
          FROM   tbl t
         ) x
   GROUP  BY 1
   ) sub
WINDOW w AS (ORDER BY bin)
ORDER  BY 1;

Using the_date instead of date as column name, avoiding reserved words as identifiers.
Since width_bucket() is currently only implemented for double precision and numeric, I extract epoch values from the_date. Details here:
Aggregating (x,y) coordinate point clouds in PostgreSQL

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • I keep getting "ERROR: function width_bucket(date, date, date, integer) does not exist. HINT: No function matches the given name and argument types. You might need to add explicit type casts." – Ryder Brooks Jul 02 '14 at 00:23
  • @user3204587: My oversight. `width_bucket()` is only implemented for `dp` and `numeric`. I updated my answer accordingly. – Erwin Brandstetter Jul 02 '14 at 00:28
  • I'm up voting regardless because of how helpful you've been and for introducing me to `width_bucket()` but this is returning the average over the constant interval of a bucket correct? Not the growing interval from date_0 to date_n where date_n = MIN(the_date) + (interval * n)? – Ryder Brooks Jul 02 '14 at 00:49
  • @user3204587: This returns the average for each of ten equally long time intervals covering the complete time range in the table. Should be what the Q asks for. Ah! .. now I get it, your intervals all start from `date_0` ... – Erwin Brandstetter Jul 02 '14 at 01:01
  • @user3204587: Added a solution to compute average values for your "growing intervals". – Erwin Brandstetter Jul 02 '14 at 01:26
3

If you have a set of data, you can readily get what you want in separate columns:

select avg(case when date between date_0 and date_1 then value end) as avg1,
       avg(case when date between date_0 and date_2 then value end) as avg2,
       . . .
       avg(case when date between date_0 and date_n then value end) as avgn            
from table t
where date >= date_0;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786