3

I am trying to create a 3 Month Moving Average based on some data that I have while using RedShift SQL or Domo BeastMode (if anyone is familiar with that).

The data is on a day to day basis, but needs to be displayed by month. So the quotes/revenue need to be summarized by month, and then a 3MMA needs to be calculated (excluding the current month).

So, if the quote was in April, I would need the average of Jan, Feb, Mar.

The input data looks like this:

Quote Date MM/DD/YYYY     Revenue
3/24/2015                 61214
8/4/2015                  22983
9/3/2015                  30000
9/15/2015                 171300
9/30/2015                 112000

And I need the output to look something like this:

Month               Revenue             3MMA
Jan 2015            =Sum of Jan Rev     =(Oct14 + Nov14 + Dec14) / 3
Feb 2015            =Sum of Feb Rev     =(Nov14 + Dec14 + Jan15) / 3
March 2015          =Sum of Mar Rev     =(Dec14 + Jan15 + Feb15) / 3
April 2015          =Sum of Apr Rev     =(Jan15 + Feb15 + Mar15) / 3
May 2015            =Sum of May Rev     =(Feb15 + Mar15 + Apr15) / 3

If anyone is able to help, I would be extremely grateful! I have been stuck on this for quite a while and have no idea what I'm doing when it comes to SQL lol.

Cheers, Logan.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
user2427023
  • 107
  • 2
  • 4
  • 9

3 Answers3

3

You can do this using aggregation and window functions:

select date_trunc('month', quotedate) as mon,
       sum(revenue) as mon_revenue,
       avg(sum(revenue)) over (order by date_trunc('month', quotedate)  rows between 2 preceding and current row) as revenue_3mon
from t
group by date_trunc('month', quotedate) 
order by mon;

Note: this uses average, so for the first and second row, it will divide by 1 and 2 respectively. It also assumes that you have at least one record for each month.

EDIT:

I wonder if there is an issue with aggregation functions mixed with analytic functions in RedShift. Is the following any better:

select m.*,
       avg(mon_revenue) over (order by mon rows between 2 preceding and current row) as revenue_3mon
from (select date_trunc('month', quotedate) as mon,
             sum(revenue) as mon_revenue
      from t
      group by date_trunc('month', quotedate) 
     ) m
order by mon;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Hi Gordon, thanks for your response :) pressed enter too early... editing now. – user2427023 Mar 20 '16 at 22:37
  • Hi Gordon, thanks for your response :) It looks like its well on its way to doing what I want, how ever the 3MMA output doesn't look 100% correct. This is what it's currently outputting. 2012-10-01 00:00:00 17777 17777 2013-01-01 00:00:00 182367 82466 2013-09-01 00:00:00 3120 324974 2013-12-01 00:00:00 291730 231895.33333333334. The date also doesn't look as if it is ordering properly as I go further down the data. Any suggestions would be great! Cheers, Logan. – user2427023 Mar 20 '16 at 22:42
  • unfortunately, it looks as if it is giving the same output :( I am currently installing Postgre Server on my PC so I can make sure that its not a Domo/Redshift issue. Even in a different scenario, the group by/order by commands are causing issues. – user2427023 Mar 21 '16 at 02:07
0

You cannot use aggregate functions and analytic function together the query should be

select m.*,
       avg(mon_revenue) over (order by mon rows between 3 preceding and 1 preceding) as revenue_3mon -- using 3 preceding and 1 preceding row you exclude the current row
from (select date_trunc('month', quotedate) as mon,
             sum(revenue) as mon_revenue
      from t
      group by date_trunc('month', quotedate) 
     ) m
order by mon;

rows between 3 preceding and 1 preceding (should remove the row in the end, otherwise redshift won't work)

Community
  • 1
  • 1
sushmit
  • 4,369
  • 2
  • 35
  • 38
0

you could do something like the way we create buckets for a rolling 6 weeks (the date column is called "date"):

case 
    when date_trunc('week',dateadd(day,1,date)) = date_trunc('week',dateadd(day,1,current_date)) then 'CW'
    when date_trunc('week',dateadd(day,1,date)) = date_trunc('week',dateadd(day,-6,current_date)) then 'LW'
    when date_trunc('week',dateadd(day,1,date)) = date_trunc('week',dateadd(day,-13,current_date)) then '2W'
    when date_trunc('week',dateadd(day,1,date)) = date_trunc('week',dateadd(day,-20,current_date)) then '3W'
    when date_trunc('week',dateadd(day,1,date)) = date_trunc('week',dateadd(day,-27,current_date)) then '4W'
    when date_trunc('week',dateadd(day,1,date)) = date_trunc('week',dateadd(day,-34,current_date)) then '5W'
    when date_trunc('week',dateadd(day,1,date)) = date_trunc('week',dateadd(day,-41,current_date)) then '6W'  
end as dateweek

You could then create an average in a subsequent step in the dataflow...

Damian
  • 2,752
  • 1
  • 29
  • 28
WHM
  • 31
  • 3