Compute running sum in a window function

Question

I have issues with this running sum in Redshift (uses Postgres 8):

select extract(month from registration_time) as month
 , extract(week from registration_time)%4+1 as week
 , extract(day from registration_time) as day
 , count(*) as count_of_users_registered
 , sum(count(*)) over (ORDER BY (1,2,3))
from loyalty.v_user
group by 1,2,3
order by 1,2,3
;

The error I get is:

ERROR: 42601: Aggregate window functions with an ORDER BY clause require a frame clause

You can't use column numbers for the order by in a window definition (and `(1,2,3)` is something different than `1,2,3` - don't use useless parentheses). `over (order by registration_time)` should do what you want — , Nov 06 '15 at 20:03
Still getting the same error with using `order by registration_time`. Is my syntax for the `sum` function itself correct? — simplycoding, Nov 06 '15 at 20:23

score 3 · Accepted Answer · edited Jun 20 '20 at 09:12

You can run a window functions on the result of aggregate function on the same query level. It's just much simpler to use a subquery in this case:

SELECT *, sum(count_registered_users) OVER (ORDER BY month, week, day) AS running_sum
FROM  (
   SELECT extract(month FROM registration_time)::int     AS month
        , extract(week  FROM registration_time)::int%4+1 AS week
        , extract(day   FROM registration_time)::int     AS day
        , count(*) AS count_registered_users
   FROM   loyalty.v_user
   GROUP  BY 1, 2, 3
   ORDER  BY 1, 2, 3
   ) sub;

I also fixed the syntax for expression computing week. extract() returns double precision, but the modulo operator % does not accept double precision numbers. I cast all three to integer while being at it.

Like @a_horse commented, you cannot use positional references in the ORDER BY clause of a window function (unlike in the ORDER BY clause of the query).

However, you cannot use over (order by registration_time) either in this query, since you are grouping by month, week, day. registration_time is neither aggregated nor in the GROUP BY clause as would be required. At that stage of the query evaluation, you cannot access the column any more.

You could repeat the expressions of the first three SELECT items in the ORDER BY clause to make it work:

SELECT extract(month FROM registration_time)::int     AS month
     , extract(week  FROM registration_time)::int%4+1 AS week
     , extract(day   FROM registration_time)::int     AS day
     , count(*) AS count_registered_users
     , sum(count(*)) OVER (ORDER BY 
              extract(month FROM registration_time)::int
            , extract(week  FROM registration_time)::int%4+1
            , extract(day   FROM registration_time)::int) AS running_sum
FROM   loyalty.v_user
GROUP  BY 1, 2, 3
ORDER  BY 1, 2, 3;

But that seems rather noisy. (Performance would be good, though.)

Aside: I do wonder about the purpose behind week%4+1 ... The whole query might be simpler.

Yeah your answer and suggestion make sense. I was getting the same error until I tried including `rows unbounded preceding` after the `order by month, week, day` clause and it worked. So for anyone who comes across this in the future, it looks like that `rows unbounded preceding` or whatever option you want is required. Again, AWS documentation is not clear about this since this is on Redshift — simplycoding, Nov 08 '15 at 22:31
@simplycoding: Redshift is something you should mention in the question to begn with, since it's not Postgres. Derived from it, but substantially different. The code above works in Postgres as is, where, [quote](http://www.postgresql.org/docs/current/interactive/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS), `The default framing option is RANGE UNBOUNDED PRECEDING`. — Erwin Brandstetter, Nov 09 '15 at 00:10

Compute running sum in a window function

1 Answers1