0

I am trying to do a running average report and I have hit a road block:

When I select data for a time period, some of the groups in my report will have no data for that period, which implies a value of 0

So, I run the query below:


SELECT CY_WEEK,
       RC,
       AVG(DURATION_MINUTES) OVER (ORDER BY CY_WEEK ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS rolling_avg
FROM MI_REPORT where CY_WEEK in ('2022_08','2022_09','2022_10','2022_11') and RC = 'FOO';

Which produces:

2022_11,FOO,97

As we can see from selecting data just for this case, there was in fact, only 1 entry

select CY_WEEK, RC,DURATION_MINUTES from MI_REPORT where CY_WEEK in ('2022_08','2022_09','2022_10','2022_11') and RC = 'FOO'

And it produces the same results as the average:

2022_11,SCM,97

But the average actually needs to be 97/4 = 24.25

How can I achieve this without fabricating a bunch of back data?

Christian Bongiorno
  • 5,150
  • 3
  • 38
  • 76
  • As long as SQL works with tabular data, please post some sample data in tabular format and the output you want for this data along with a description of rules how to calculate this. It's not clear why `sum` should return an average and how to get the numbers in question without knowing the input. – astentx Mar 01 '23 at 06:50
  • Everyone's suggestions are great, but as soon as I extrapolate beyond this 1 case, the results fall apart. I am going to see if I can supply some tabular data that's sanitized so all the great ideas can come together – Christian Bongiorno Mar 01 '23 at 17:00
  • After understanding better what your issue is, I deleted my previous answer and adding a new one. It will give you a true running average for as many rows as you want while treating a missing row like a 0. – Paul W Mar 01 '23 at 17:16
  • I've created a whole new question and will delete this one. The new one is much clearer https://stackoverflow.com/questions/75631470/oracle-4-weeks-running-average-missing-values-must-imply-0 – Christian Bongiorno Mar 03 '23 at 20:10
  • I totally revamped the question and posted a new one. If you think it better, I can just overwrite this question – Christian Bongiorno Mar 03 '23 at 21:42

5 Answers5

1

It sounds like whatever you get for your sum, you just divide it by 4. You do need a Coalesce(calulation, 0) to turn nulls into zeros. So,

Coalesce(SUM(DURATION_MINUTES) OVER (ORDER BY CY_WEEK ROWS BETWEEN 3 PRECEDING AND CURRENT ROW),0)/4
Chris Maurer
  • 2,339
  • 1
  • 9
  • 8
  • I've created a whole new question and will delete this one. The new one is much clearer https://stackoverflow.com/questions/75631470/oracle-4-weeks-running-average-missing-values-must-imply-0 – Christian Bongiorno Mar 03 '23 at 20:10
1

There are not null values in your data, if I got it right, there are missing rows, though. It seems to me that your data looks like this:

WITH
    tbl (CY_WEEK, RC, DURATION_MINUTES) AS
        (
            Select '2021_09', 'FOO', 75 From Dual Union All
            Select '2021_11', 'FOO', 83 From Dual Union All
            Select '2022_01', 'FOO', 69 From Dual Union All
            Select '2022_03', 'FOO', 99 From Dual Union All
            Select '2022_06', 'FOO', 78 From Dual Union All
            Select '2022_07', 'FOO', 91 From Dual Union All
            Select '2022_11', 'FOO', 97 From Dual Union All
            Select '2022_12', 'FOO', 85 From Dual 
        ) 

If it is true then you should just divide the sum by 4. Something like here:

Select  CY_WEEK, RC, DURATION_MINUTES,
        Sum(DURATION_MINUTES) OVER() "TOTAL_MINS",
        4 "NUM_OF_WEEKS",
        Sum(DURATION_MINUTES) OVER() / 4 "WEEKLY_AVG_MINUTES"
From    tbl
Where   CY_WEEK Between '2022_08' And '2022_11' And RC = 'FOO'

CY_WEEK RC  DURATION_MINUTES TOTAL_MINS NUM_OF_WEEKS WEEKLY_AVG_MINUTES
------- --- ---------------- ---------- ------------ ------------------
2022_11 FOO               97         97            4              24.25 

With this sample data the result for previous 4 weeks (04 - 07) is:

...
Where   CY_WEEK Between '2022_04' And '2022_07' And RC = 'FOO'

CY_WEEK RC  DURATION_MINUTES TOTAL_MINS NUM_OF_WEEKS WEEKLY_AVG_MINUTES
------- --- ---------------- ---------- ------------ ------------------
2022_06 FOO               78        169            4              42.25 
2022_07 FOO               91        169            4              42.25

You don't need the windowing clause either as where condition filters the data.

ADDITION - regarding different RCs
If you want the result per RC without filtering in WHERE clause then you should do OVER(Partition By RC) in analytic function. Below find added 1 more row to the sample data RC=BAZ and adjusted code and result.

            ... ...
            Select '2022_11', 'BAZ', 43 From Dual Union All  -- new row in sample data
            ... ...

Select  CY_WEEK, RC, DURATION_MINUTES,
        Sum(DURATION_MINUTES) OVER(PARTITION BY RC) "TOTAL_MINS",
        4 "NUM_OF_WEEKS",
        Sum(DURATION_MINUTES) OVER(PARTITION BY RC) / 4 "WEEKLY_AVG_MINUTES"
From    tbl
Where   CY_WEEK Between '2022_08' And '2022_11'

CY_WEEK RC  DURATION_MINUTES TOTAL_MINS NUM_OF_WEEKS WEEKLY_AVG_MINUTES
------- --- ---------------- ---------- ------------ ------------------
2022_11 BAZ               43         43            4              10.75 
2022_11 FOO               97         97            4              24.25
d r
  • 3,848
  • 2
  • 4
  • 15
  • This is exactly my situation and I came to the same conclusion as you about using division, but the syntax I was completely lost on. However, when I take your query and remove the `RC = 'FOO'` clause, (so I can see the results for RCs, the numbers are completely off. Every RC has the exact same number of weekly avg minutes (which is I know isn't true), and worse, for the case in question, it's also not right. `2022_11,BAZ,16,4057,4,1014.25` `2022_11,FOO,97,4057,4,1014.25` I am trying to supply good data without spilling corp secrets, for sample purposes – Christian Bongiorno Mar 01 '23 at 16:35
  • @ChristianBongiorno If you remove RC from where clause and expect the result per RC then just replace OVER() with OVER(PARTITION BY RC) on both places in code. Analytic function in the answer was coded for the where clause filtering just one RC value ('FOO'). – d r Mar 01 '23 at 17:08
  • @ChristianBongiorno Just posted the addition, please check it out. – d r Mar 01 '23 at 17:19
  • I've created a whole new question and will delete this one. The new one is much clearer https://stackoverflow.com/questions/75631470/oracle-4-weeks-running-average-missing-values-must-imply-0 – Christian Bongiorno Mar 03 '23 at 20:10
1

You can generate a calendar with all the weeks in your range and the 3 preceding weeks, to cover the rolling window you are averaging, and then join that to your data to fill in the missing rows and find the averages and, after that, filter on the range of weeks:

WITH weeks (week) AS (
  SELECT '2022_05' FROM DUAL UNION ALL
  SELECT '2022_06' FROM DUAL UNION ALL
  SELECT '2022_07' FROM DUAL UNION ALL
  SELECT '2022_08' FROM DUAL UNION ALL
  SELECT '2022_09' FROM DUAL UNION ALL
  SELECT '2022_10' FROM DUAL UNION ALL
  SELECT '2022_11' FROM DUAL
),
averages (cy_week, rc, rolling_avg) AS (
  SELECT w.week,
         r.rc,
         AVG(COALESCE(r.duration_minutes, 0)) OVER (
           ORDER BY w.week ROWS BETWEEN 3 PRECEDING AND CURRENT ROW
         ) AS rolling_avg
  FROM   weeks w
         LEFT OUTER JOIN MI_REPORT r
         PARTITION BY (r.rc)
         ON (w.week = r.cy_week)
  WHERE  RC = 'FOO'
)
SELECT cy_week,
       rc,
       rolling_avg
FROM   averages
WHERE  cy_week in ('2022_08','2022_09','2022_10','2022_11');

Which, for the sample data:

CREATE TABLE MI_REPORT (cy_week, rc, duration_minutes) AS
SELECT '2022_11', 'FOO', 97 FROM DUAL;

Outputs:

CY_WEEK RC ROLLING_AVG
2022_08 FOO 0
2022_09 FOO 0
2022_10 FOO 0
2022_11 FOO 24.25

fiddle

MT0
  • 143,790
  • 11
  • 59
  • 117
  • Thanks for the assist - I am going to try to produce some sanitized data so that we can get exactly what I need. Your query doesn't appear right when extrapolated, but it's also making me question if what I have to compare it to is accurate. – Christian Bongiorno Mar 01 '23 at 17:02
  • I've created a whole new question and will delete this one. The new one is much clearer https://stackoverflow.com/questions/75631470/oracle-4-weeks-running-average-missing-values-must-imply-0 – Christian Bongiorno Mar 03 '23 at 20:10
1

I understand that you want a running average but in the absence of a row within the desired past 4 weeks, you want to treat that missing row as a value of 0, rather than simply omitting it from the average. You are right, you can certainly backfill missing rows with 0s and use an AVG with a ROWS BETWEEN windowing. But if you wish to do it without backfilling, you can join the table back to itself to find historical rows within the 4-week window, add their values up and divide by 4. Then any missing row would have the same effect as it would if it were present with a 0.

Note that for this to work, you have to do date arithmetic. That requires the use of real dates, not strings like 2022_11. So in my answer below, after generating some test data for testing only, I then convert the strings into real dates, so that the next phase can do the date arithmetic properly. Also, I'm assuming CY_WEEK is an ISO week number, not a month. Obviously you can adjust the date conversion and arithmetic if it's something other than a week number.

This does give a result of 24.25:

WITH
    MI_REPORT AS
        (
            Select '2021_09' CY_WEEK, 'FOO' RC, 75 DURATION_MINUTES From Dual Union All
            Select '2021_11', 'FOO', 83 From Dual Union All
            Select '2022_01', 'FOO', 69 From Dual Union All
            Select '2022_03', 'FOO', 99 From Dual Union All
            Select '2022_06', 'FOO', 78 From Dual Union All
            Select '2022_07', 'FOO', 91 From Dual Union All
            Select '2022_11', 'FOO', 97 From Dual Union All
            Select '2022_12', 'FOO', 85 From Dual 
        ),
       fixeddate AS
       (SELECT TO_DATE(year,'YYYY') + TO_NUMBER(week)*7 cy_date,
               cy_week,
               rc,
               duration_minutes
          FROM (SELECT cy_week,
                       SUBSTR(cy_week,1,4) year,
                       SUBSTR(cy_week,6,2) week,
                       rc,
                       duration_minutes
                  FROM mi_report))
SELECT curr.CY_WEEK,
       curr.RC,
       SUM(prev.DURATION_MINUTES)/4 rolling_avg
 FROM fixeddate curr,
      fixeddate prev
WHERE curr.RC = 'FOO'
  AND curr.rc = prev.rc
  AND prev.cy_date > curr.cy_date - (7*4) 
  AND prev.cy_date <= curr.cy_date
  AND curr.CY_WEEK in ('2022_08','2022_09','2022_10','2022_11')
GROUP BY curr.CY_WEEK,
         curr.RC
     
Paul W
  • 5,507
  • 2
  • 2
  • 13
  • I've created a whole new question and will delete this one. The new one is much clearer https://stackoverflow.com/questions/75631470/oracle-4-weeks-running-average-missing-values-must-imply-0 – Christian Bongiorno Mar 03 '23 at 20:11
1

For a correct answer, your base table should be calendar. All transactions for the report should be joined to it and then calculated.

Here is the simple calendar factory to easily produce days or months and not bother with maintaining/calculating right amount and so on:

create or replace type list_date is table of date;
/

create or replace function days_factory(p_date_from date, p_date_to date, p_mode in varchar2 default 'DD') return list_date
pipelined deterministic as
  begin
    if p_mode = 'DD' then
      for i in (select trunc(p_date_from, p_mode) + (level - 1) calterm
                  from dual
                connect by level <= (1 + trunc(p_date_to) - trunc(p_date_from)))         loop
        pipe row(i.calterm);
      end loop;
    else
      for i in (select add_months(trunc(p_date_from, p_mode), level - 1) calterm
                  from dual
                connect by level <= (1 + months_between(trunc(p_date_to, p_mode), trunc(p_date_from, p_mode)))) loop
        pipe row(i.calterm);
      end loop;
    end if;
  end;
/

After creating these objects, use select * from table(days_factory(date '2022-08-01', date '2022-12-31', 'MM')) to get months table, then left join your sales/whatever and only after this you can calculate correct averages.

Dima Yankin
  • 365
  • 1
  • 12