2

I have a query that does a basic moving average using the FOLLOWING / PRECEDING syntax of PostgreSQL 9.0. To my horror I discovered our pg server runs on 8.4 and there is no scope to get an upgrade in the near future.

I am therefore looking for the simplest way to make a backwards compatible query of the following:

SELECT time_series,
       avg_price AS daily_price,
       CASE WHEN row_number() OVER (ORDER BY time_series) > 7 
        THEN avg(avg_price) OVER (ORDER BY time_series DESC ROWS BETWEEN 0 FOLLOWING
                                                                     AND 6 FOLLOWING)
        ELSE NULL 
       END AS avg_price
FROM (
   SELECT to_char(closing_date, 'YYYY/MM/DD') AS time_series,
          SUM(price) / COUNT(itemname) AS avg_price
   FROM auction_prices 
   WHERE itemname = 'iphone6_16gb' AND price < 1000
   GROUP BY time_series
   ) sub

It is a basic 7-day moving average for a table containing price and timestamp columns:

closing_date timestamp
price        numeric
itemname     text

The requirement for basic is due to my basic knowledge of SQL.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
CharlieSmith
  • 311
  • 4
  • 12
  • 1
    It looks like there is *at least* a `) alias;` missing from the end of your query. – wildplasser May 07 '15 at 19:24
  • It would be decent to provide some demo values to work with. – Erwin Brandstetter May 07 '15 at 23:04
  • Other considerations aside, I would question how you calculate the weekly average. The average of daily averages can be *very* misleading. If you sell 100 phones for 900 bucks on Monday and 1 phone for 100 bucks on Friday, you calculate a weekly average of 500, while it should really be 892 ... Is that what you intend? – Erwin Brandstetter May 08 '15 at 12:48

3 Answers3

1

Postgres 8.4 already has CTEs.
I suggest to use that, calculate the daily average in a CTE and then self-join to all days (existing or not) in the past week. Finally, aggregate once more for the weekly average:

WITH cte AS (
   SELECT closing_date::date AS closing_day
        , sum(price)   AS day_sum
        , count(price) AS day_ct
   FROM   auction_prices
   WHERE  itemname = 'iphone6_16gb'
   AND    price <= 1000  -- including upper border
   GROUP  BY 1
   )   
SELECT d.closing_day
     , CASE WHEN d.day_ct > 1
            THEN d.day_sum / d.day_ct
            ELSE d.day_sum
       END AS avg_day         -- also avoids division-by-zero
     , CASE WHEN sum(w.day_ct) > 1
            THEN sum(w.day_sum) / sum(w.day_ct)
            ELSE sum(w.day_sum)
       END AS week_avg_proper  -- also avoids division-by-zero
FROM   cte d
JOIN   cte w ON w.closing_day BETWEEN d.closing_day - 6 AND d.closing_day
GROUP  BY d.closing_day, d.day_sum, d.day_ct
ORDER  BY 1;

SQL Fiddle. (Running on Postgres 9.3, but should work in 8.4, too.)

Notes

  • I used a different (correct) algorithm to calculate the weekly average. See considerations in my comment to the question.

  • This calculates averages for every day in the base table, including corner cases. But no row for days without any rows.

  • One can subtract integer from date: d.closing_day - 6. (But not from varchar or timestamp!)

  • It's rather confusing that you call a timestamp column closing_date - it's not a date, it's a timestamp. And time_series for the resulting column with a date value? I use closing_day instead ...

  • Note how I count prices count(price), not items COUNT(itemname) - which would be an entry point for a sneaky error if either of the columns can be NULL. If neither can be NULL count(*) would be superior.

  • The CASE construct avoids division-by-zero errors, which can occur as long as the column you are counting can be NULL. I could use COALESCE for the purpose, but while being at it I simplified the case for exactly 1 price as well.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • 1
    Wunderbar! worked directly out of the box. I very much appreciate your comments and suggestions. Averages of daily averages was also something we overlooked. Many thanks. – CharlieSmith May 08 '15 at 17:28
0

PostgreSQL 8.4.... wasn't that in the day when everybody thought Windows 95 was great? Anyway...

The only option I can think of is to use a stored procedure with a scrollable cursor and do the math manually:

CREATE FUNCTION auction_prices(item text, price_limit real)
  RETURNS TABLE (closing_date timestamp, avg_day real, avg_7_day real) AS $$
DECLARE
  last_date  date;
  first_date date;
  cur        refcursor;
  rec        record;
  dt         date;
  today      date;
  today_avg  real;
  p          real;
  sum_p      real;
  n          integer;
BEGIN
  -- There may be days when an item was not traded under the price limit, so need a
  -- series of consecutive days to find all days. Find the end-points of that
  -- interval.
  SELECT max(closing_date), min(closing_date) INTO last_date, first_date
  FROM auction_prices
  WHERE itemname = item AND price < price_limit;

  -- Need at least some data, so quit if item was never traded under the price limit.
  IF NOT FOUND THEN
    RETURN;
  END IF;

  -- Create a scrollable cursor over the auction_prices daily average and the
  -- series of consecutive days. The LEFT JOIN means that you will get a NULL
  -- for avg_price on days without trading.
  OPEN cur SCROLL FOR
    SELECT days.dt, sub.avg_price
    FROM generate_series(last_date, first_date, interval '-1 day') AS days(dt)
    LEFT JOIN (
      SELECT sum(price) / count(itemname) AS avg_price
      FROM auction_prices 
      WHERE itemname = item AND price < price_limit
      GROUP BY closing_date
    ) sub ON sub.closing_date::date = days.dt::date;

  <<all_recs>>
  LOOP -- over the entire date series
    -- Get today's data (today = first day of 7-day period)
    FETCH cur INTO today, today_avg;
    EXIT all_recs WHEN NOT FOUND; -- No more data, so exit the loop
    IF today_avg IS NULL THEN
      n := 0;
      sum_p := 0.0;
    ELSE
      n := 1;
      sum_p := today_avg;
    END IF;

    -- Loop over the remaining 6 days
    FOR i IN 2 .. 7 LOOP
      FETCH cur INTO dt, p;
      EXIT all_recs WHEN NOT FOUND; -- No more data, so exit the loop
      IF p IS NOT NULL THEN
        sum_p := sum_p + p;
        n := n + 1;
      END IF;
    END LOOP;

    -- Save the data to the result set
    IF n > 0 THEN
      RETURN NEXT today, today_avg, sum_p / n;
    ELSE
      RETURN NEXT today, today_avg, NULL;
    END IF;

    -- Move the cursor back to the starting row of the next 7-day period
    MOVE RELATIVE -6 FROM cur;
  END LOOP all_recs;
  CLOSE cur;

  RETURN;
END; $$ LANGUAGE plpgsql STRICT;

A few notes:

  • There may be dates when an item is not traded under the limit price. In order to get accurate moving averages, you need to include those days. Generate a series of consecutive dates during which the item was indeed traded under the limit price and you will get accurate results.
  • The cursor needs to be scrollable such that you can look forward 6 days to earlier dates to get data needed for the calculation, and then move back 6 days to calculate the average for the next day.
  • You cannot calculate a moving average on the last 6 days. The simple reason is that the MOVE command needs a constant number of records to move. Parameter substitution is not supported. On the up side, your moving average will always be for 7 days (of which not all may have seen trading).
  • This function will by no means be fast, but it should work. No guarantees though, I have not worked on an 8.4 box for years.

Use of this function is rather straightforward. Since it is returning a table you can use it in a FROM clause like any other table (and even JOIN to other relations):

SELECT to_char(closing_date, 'YYYY/MM/DD') AS time_series, avg_day, avg_7_day
FROM auction_prices('iphone6_16gb', 1000);
Patrick
  • 29,357
  • 6
  • 62
  • 90
  • On a first glance I see a couple of problems: Must be `FOR i IN 2 .. 7`, not `2 TO 7`. `RETURN NEXT` cannot have parameters in function with `RETURNS TABLE`. This won't work as is ... – Erwin Brandstetter May 08 '15 at 13:48
  • Bummer. So should `RETURN SETOF`, presumably. Was assuming CTE's weren't around in 8.0. – Patrick May 08 '15 at 15:40
  • You can make it work with `RETURNS TABLE`, too: http://stackoverflow.com/questions/14039720/plpgsql-error-return-next-cannot-have-a-parameter-in-function-with-out-paramete/14039866#14039866. And you are right about CTEs (introduced with 8.4, not there in 8.0), but it seems you are confusing versions ... – Erwin Brandstetter May 08 '15 at 15:47
  • Yes. Not around in 8.4, not around until 9.0. But that was a bad assumption. – Patrick May 08 '15 at 15:50
0
        -- make a subset and rank it on date
WITH xxx AS (
        SELECT
        rank() OVER(ORDER BY closing_date) AS rnk
        , closing_date
        , price
        FROM auction_prices
        WHERE itemname = 'iphone6_16gb' AND price < 1000
        )
        -- select subset, + aggregate on self-join
SELECT this.*
        , (SELECT AVG(price) AS mean
                FROM xxx that
                WHERE that.rnk > this.rnk + 0 -- <<-- adjust window
                AND that.rnk < this.rnk + 7   -- <<-- here
                )
FROM xxx this
ORDER BY this.rnk
        ;
  • Note: the CTE is for conveniance (Postgres-8.4 does have CTE's), but the CTE could be replaced by a subquery or, more elegantly, by a view.
  • The code assumes that the time series is has no gaps (:one opservation for every {product*day}. When not: join with a calendar table (which could also contain the rank.)
  • (also note that I did not cover the corner cases.)
joop
  • 4,330
  • 1
  • 15
  • 26
  • You can't easily replace the CTE with a subquery, since it's referenced on two different levels. A view would work, though. Also, I don't think you can assume a single row per day in the base table. The subquery in the Q indicates multiple rows per day. – Erwin Brandstetter May 08 '15 at 16:08
  • Yes. The other way would be to specify the aggregation-window in terms of `that.stamp > this.stamp and that.stamp <= this.stamp + xxx:interval`. The current answer always uses six (or less) records in the forward time-window. – joop May 08 '15 at 16:16