0

I want to calculate avg or sum (metric) over (past 31 days) to use it in visualization.

"metric" varies every days but it always jumps at the first day of calendar months.

The problem is that some months (like November) are 30-day long. So this function actually includes two first day of the months on first of December (run the query below and check the row at 2021-12-01T00:00:00Z).

I need avg (metric) over (past 30 days) if we have two first days of the months in the window and (past 31 days) otherwise.

with days as(
  select
    '2021-10-01' :: timestamptz + (d || ' day') ::interval as "day"
  from generate_series(0, 100) d
 )
, daily_metrics as (
  select 
    "day"
  -- in reality "metric" fluctuates every day. But it jumps on the first day of the months
  , case when extract(day from "day") = 1 then 300 else 100 end :: float as metric
  from days
 )
 , result as (
  select 
    "day"
  , avg(metric) over (rows between 30 preceding and current row) as metric_roll_avg
  from daily_metrics
)

select * from result 
where "day" > '2021-10-01' :: timestamptz + '30 day' :: interval 
homam
  • 1,945
  • 1
  • 19
  • 26
  • How about this topic: https://stackoverflow.com/questions/11743810/how-to-get-first-and-last-day-of-previous-month-with-timestamp-in-sql-server/39447814 ? – Bartosz Olchowik Jan 11 '22 at 12:08

1 Answers1

0

This is what I ended up doing:

First crate a calendar view, each rows for the days of the calendar

with
  calendar as (
    select c."day"
    , extract(day from c."day") = 1 and extract(day from date_trunc('month', c."day") - interval '1 day') = 30 as last_month_30
    , extract(day from c."day") in (1, 2) and extract(day from date_trunc('month', c."day") - interval '1 day') = 29 as last_month_29
    , extract(day from c."day") in (1, 2, 3) and extract(day from date_trunc('month', c."day") - interval '1 day') = 28 as last_month_28
  from ... as c ...
) 

Which returns this view:

+------------+---------------+---------------+---------------+
| day        | last_month_30 | last_month_29 | last_month_28 |
|------------+---------------+---------------+---------------|
| 2022-02-26 | False         | False         | False         |
| 2022-02-27 | False         | False         | False         |
| 2022-02-28 | False         | False         | False         |
| 2022-03-01 | False         | False         | True          |
| 2022-03-02 | False         | False         | True          |
+------------+---------------+---------------+---------------+

And using a case switch:

select 
  "day"
, case 
  when last_month_30
    then avg(revenue) over (order by "day" rows between 29 preceding and current row )  
  when last_month_29
    then avg(revenue) over (order by "day" rows between 28 preceding and current row)  
  when last_month_28
    then avg(revenue) over (order by "day" rows between 27 preceding and current row)  
    else avg(revenue) over (order by "day" rows between 30 preceding and current row ) 
  end as monthly_revenue
from ...

Probably not the cleanest way, it but works for all cases.

homam
  • 1,945
  • 1
  • 19
  • 26