0

I am trying to create a 4 week running average of data, and then pivot it for a pivot table report. I am basically porting a spreadsheet-as-a-DB and it's reports and I am just starting to realize how sloppy and forgiving excel is.

The goal is to take the running 4 week average, of all the RCs (even if they have no data for that week).

  • So, if FOO had no data for the last 4 weeks, it needs to still show a rolling average of 0
  • If there was no data for any part of the last 4 weeks, that value must be treated as 0 for that RC

Below is some sanitized, sample data which contains 3 weeks of data. I've supplied sample of output for 2 weeks, so those who wanna take a stab at this still have a mystery week to prove out their query.

CREATE TABLE MY_TABLE
(
    cy               int          NOT NULL,
    week             int          NOT NULL,
    RC               VARCHAR2(10) NOT NULL,
    duration_minutes number       Not NULL
);

INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 06, 'HCM', 86);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 06, 'ERP', 12);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 06, 'PBI', 16);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 06, 'PBI', 75);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 06, 'ERP', 24);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 06, 'ERP', 48);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 06, 'CRM', 25);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 07, 'HCM', 43);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 07, 'TBD', 4);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 07, 'POTHER', 9);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 07, 'POTHER', 14);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 07, 'POTHER', 26);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 07, 'HCM', 240);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 07, 'PBI', 187);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 07, 'HCM', 67);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 08, 'PBI', 491);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 08, 'PBI', 192);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 08, 'PBI', 57);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 08, 'PWC', 499);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 08, 'TBD', 105);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 08, 'HCM', 37);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 08, 'POTHER', 380);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 08, 'TBD', 46);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 08, 'POTHER', 140);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 08, 'HCM', 116);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 09, 'PWC', 69);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 09, 'PBI', 54);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 09, 'PALPHABET', 128);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 09, 'HCM', 172);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 09, 'HCM', 96);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 09, 'PDT', 92);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 10, 'PIO', 350);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 10, 'TBD', 8);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 10, 'TBD', 180);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 10, 'PIO', 13);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 10, 'PIO', 25);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 10, 'PIO', 37);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 10, 'HCM', 184);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 10, 'HCM', 59);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 10, 'PIO', 24);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 10, 'PIO', 59);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 10, 'HCM', 65);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 10, 'PBU', 209);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 10, 'PIO', 242);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 11, 'SCM', 97);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 11, 'PBU', 240);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 11, 'PIO', 16);

commit;

This query is not ideal because I have to spell out all of the weeks. I need a query that just does this across all data. Maybe a clause of where CY_WEEK between MIN(CY_WEEK) and MAX(CY_WEEK), but this would still not account for missing data.

select * from (
    select '2022_10' as CY_WEEK, RC, SUM(s) / 4 as rolling_avg
    from (
           select RC, SUM(DURATION_MINUTES) as s
           from MY_TABLE where CY = 2022 and WEEK between 07 and 10
           group by cy,week, RC order by  RC
    ) group by RC union all

    select '2022_11' as CY_WEEK, RC, SUM(s) / 4 as rolling_avg
    from (
           select RC, SUM(DURATION_MINUTES) as s
           from MY_TABLE where CY = 2022 and WEEK between 08 and 11
           group by cy,week, RC order by  RC
    ) group by RC

) order by CY_WEEK, RC;

The unpivoted data, when I am done, looks like this (Ideally: every week would have 1 line, with the 4 week rolling average, for all RCs, even if they had no data). I've added the comment field just to articulate the issue, thought it's not in the query:

CY_WEEK RC ROLLING_AVG comment
2022_10 HCM 269.75
2022_10 PALPHABET 32
2022_10 PBI 245.25
2022_10 SCM 0 no scm data for the week
2022_10 PBU 52.25
2022_10 PDT 23
2022_10 PIO 187.5
2022_10 POTHER 142.25
2022_10 PWC 142
2022_10 TBD 85.75
2022_11 HCM 182.25
2022_11 PALPHABET 32
2022_11 PBI 198.5
2022_11 PBU 112.25
2022_11 PDT 23
2022_11 PIO 191.5
2022_11 POTHER 130
2022_11 PWC 142
2022_11 SCM 24.25
2022_11 TBD 84.75

Ideal format:

CY_WEEK HCM PALPHABET PBI SCM PBU PDT PIO POTHER PWC TBD
2022_11 269.75 32 245.25 0 52.25 23 187.5 142.25 142 85.75
2022_11 182.25 32 198.5 24.25 112.25 23 191.5 130 142 85.75

There are a lot of RCs. It would be great to not have to list them all in the query

Christian Bongiorno
  • 5,150
  • 3
  • 38
  • 76
  • How do you handle the year wrapping? For `2023_01` what value is 4 weeks back? – EdmCoff Mar 03 '23 at 20:25
  • Fantastic question... for another day :) – Christian Bongiorno Mar 03 '23 at 20:32
  • Okay, but if you want a solution that gets all the weeks between the minimum week and the maximum week, the wrapping is kind of important. Should we just assume all the weeks are 2022? – EdmCoff Mar 03 '23 at 20:41
  • For this excercise, yes. Your question is totally valid though. It's just not one I am ready to answer – Christian Bongiorno Mar 03 '23 at 20:45
  • I've posted an answer that I think is pretty close to what you want, although some of the values seem to be different than your expected output. However, my numbers seem to match your attempted query. Either way, hopefully it helps with your main problems of not hardcoding any dates in the query and doing the pivot. – EdmCoff Mar 03 '23 at 22:14

4 Answers4

2

You can use a PARTITIONed OUTER JOIN to ensure that there is a row for each rc/cy/week combination and then use the analytic SUM function to get the rolling total for the past 4 weeks and then PIVOT:

WITH rcs (rc) AS (
  SELECT 'HCM' FROM DUAL UNION ALL
  SELECT 'PALPHABET' FROM DUAL UNION ALL
  SELECT 'PBI' FROM DUAL UNION ALL
  SELECT 'SCM' FROM DUAL UNION ALL
  SELECT 'PBU' FROM DUAL UNION ALL
  SELECT 'PDT' FROM DUAL UNION ALL
  SELECT 'PIO' FROM DUAL UNION ALL
  SELECT 'POTHER' FROM DUAL UNION ALL
  SELECT 'PWC' FROM DUAL UNION ALL
  SELECT 'TBD' FROM DUAL
)
SELECT *
FROM   (
  SELECT cy,
         week,
         r.rc,
         SUM(COALESCE(duration_minutes, 0)) OVER (
           PARTITION BY r.rc
           ORDER BY TO_DATE(cy||'-01-01', 'YYYY-MM-DD') + week * 7 - 1
           RANGE BETWEEN INTERVAL '21' DAY PRECEDING AND CURRENT ROW
         ) / 4 AS avg_duration
  FROM   rcs r
         LEFT OUTER JOIN my_table m
         PARTITION BY (m.cy, m.week)
         ON (r.rc = m.rc)
)
PIVOT (
  MAX(avg_duration)
  FOR rc IN (
    'HCM' AS hcm,
    'PALPHABET' AS palphabet,
    'PBI' AS pbi,
    'SCM' AS scm,
    'PBU' AS pbu,
    'PDT' AS pdt,
    'PIO' AS pio,
    'POTHER' AS pother,
    'PWC' AS pwc,
    'TBD' AS tdb
  )
)
ORDER BY cy, week;

Which, for the sample data, outputs:

CY WEEK HCM PALPHABET PBI SCM PBU PDT PIO POTHER PWC TDB
2022 6 21.5 0 22.75 0 0 0 0 0 0 0
2022 7 109 0 69.5 0 0 0 0 12.25 0 1
2022 8 147.25 0 254.5 0 0 0 0 142.25 124.75 38.75
2022 9 214.25 32 268 0 0 23 0 142.25 142 38.75
2022 10 269.75 32 245.25 0 52.25 23 187.5 142.25 142 85.75
2022 11 182.25 32 198.5 24.25 112.25 23 191.5 130 142 84.75

fiddle

MT0
  • 143,790
  • 11
  • 59
  • 117
  • I've updated the question to use numbers for year and week. This is looking promising – Christian Bongiorno Mar 03 '23 at 23:23
  • When I run your query, I get `[72000][12801] ORA-12801: error signaled in parallel query server P00T, instance 5 ORA-01861: literal does not match format string Position: 0` – Christian Bongiorno Mar 03 '23 at 23:39
  • You definitely win the answer. One bonus: Is there anyway to do this without spelling out all the RCs ? I mean, it's the not end of the world if not – Christian Bongiorno Mar 04 '23 at 00:38
  • @ChristianBongiorno No, you cannot `PIVOT` on an unknown list; you must list all the values. – MT0 Mar 04 '23 at 00:41
  • * Is there any way to do this without spelling out all the RCs ?*I n Oracle this is si very very painful e.g. https://stackoverflow.com/questions/15491661/dynamic-pivot-in-oracles-sql I'm not sure why you didn't just pursue an Excel solution – Paul Maxwell Mar 04 '23 at 01:33
  • I am specifically moving away from excel @PaulMaxwell – Christian Bongiorno Mar 06 '23 at 17:29
  • @MT0 - wanna take a stab at this: https://stackoverflow.com/questions/75666410/oracle-sql-how-to-sum-groups-with-total-of-all-groups – Christian Bongiorno Mar 07 '23 at 19:31
1

You'll possibly want to use MTO's answer, since it operates on true dates (so it will handle the week reseting after the year changes) and is cleaner in how in handles the coalesces. However, I have adjusted my answer to work with your new data and new fields:

WITH ranges AS
(
 select distinct week s FROM my_table
)
select cy, week, COALESCE(HCM, 0) HCM, COALESCE(PALPHABET, 0) PALPHABET, COALESCE(PBI, 0) PBI, COALESCE(SCM, 0) SCM, COALESCE(PBU, 0) PBU, COALESCE(PDT, 0) PDT, COALESCE(PIO, 0) PIO, COALESCE(POTHER, 0) POTHER, COALESCE(PWC, 0) PWC, COALESCE(TBD, 0) TBD
from (
 SELECT cy, s week, rc, SUM(duration_minutes) rolling_avg
 FROM ranges r left outer join my_table m
   ON week > r.s - 4
    AND week <= r.s
 GROUP BY cy,s,rc
)
pivot 
(
   sum(rolling_avg/4)
   for rc IN ('HCM' HCM, 'PALPHABET' PALPHABET, 'PBI' PBI, 'SCM' SCM, 'PBU' PBU, 'PDT' PDT, 'PIO' PIO, 'POTHER' POTHER, 'PWC' PWC, 'TBD' TBD)
)
ORDER BY cy, week

You can see it in this Fiddle.

Original answer:

I think you are looking for something like:

WITH ranges AS
(
 select (SELECT MIN(CAST(SUBSTR(CY_WEEK, INSTR(CY_WEEK, '_') + 1) AS INT)) FROM MY_TABLE) + level - 1 s
 from   dual
 connect by level <= (SELECT MAX(CAST(SUBSTR(CY_WEEK, INSTR(CY_WEEK, '_') + 1) AS INT)) FROM MY_TABLE) - (SELECT MIN(CAST(SUBSTR(CY_WEEK, INSTR(CY_WEEK, '_') + 1) AS INT)) FROM MY_TABLE)+1 
)
select cy_week, COALESCE(TBD, 0) TBD, COALESCE(HCM, 0) HCM, COALESCE(SCM, 0) SCM, COALESCE(PIO, 0) PIO, COALESCE(PBU, 0) PBU, COALESCE(FOO, 0) FOO 
from (
 SELECT '2022_'||r.s cy_week, rc, SUM(duration_minutes) rolling_avg
 FROM ranges r left outer join my_table m
   ON CAST(SUBSTR(CY_WEEK, INSTR(CY_WEEK, '_') + 1) AS INT) > r.s - 4
    AND CAST(SUBSTR(CY_WEEK, INSTR(CY_WEEK, '_') + 1) AS INT) <= r.s
 GROUP BY s,rc
)
pivot 
(
   sum(rolling_avg/4)
   for rc IN ('TBD' TBD, 'HCM' HCM, 'SCM' SCM, 'PIO' PIO, 'PBU' PBU, 'FOO' FOO)
)
ORDER BY CAST(SUBSTR(CY_WEEK, INSTR(CY_WEEK, '_') + 1) as int)

You can see it working in this Fiddle. This is a bit messy and can likely be cleaned up a bit, but hopefully it's a good starting point.

The ranges cte gets all the weeks between your first and your last date. However, if you know that all weeks exist in your data, you could simplify this to something like

WITH ranges AS
(SELECT DISTINCT CAST(SUBSTR(CY_WEEK, INSTR(CY_WEEK, '_') + 1) AS INT) s FROM MY_TABLE)

It would also be easier to work with actual dates, since this simply treats the part of the string after the underscore as an integer for comparison. You would need to add logic to handle wrapping (if the week number resets when the year changes). You could also clean this up a bit by putting the substring/cast in one CTE and run the query against that (so that logic is not repeated in a number of places).

Since you explicitly said you want 0s, I wrote out the COALESCEs in the SELECT statement (there might be a better way of doing this). If nulls were fine, you could just use SELECT *.

EdmCoff
  • 3,506
  • 1
  • 9
  • 9
  • Yes! This seems to be it. I will update the original insert data to use numbers instead of concats – Christian Bongiorno Mar 03 '23 at 22:27
  • Ok, the question has been updated and I removed dummy data. I don't know how, but it seems I got more data on this run even with the data posted here. I must have trimmed it for some reason in the original post. I have also changed the year and week to their numeric forms – Christian Bongiorno Mar 03 '23 at 23:21
0

Not enough material to flesh out the whole answer, but the basic strategy would be to

  1. Create a temporary table with the 4-week data. Maybe you want to supply this manually, or compute it from CURRENT_DATE()
  2. Create a temporary table with all possible values for RC : SELECT DISTINCT RC FROM MY_TABLE
  3. Do a CROSS JOIN of those two tables
  4. Do a LEFT JOIN of pairs from the cross join to MY_TABLE
dami
  • 146
  • 8
0

With MY_TABLE data as provided you could do it like here:

  1. create cte generating weeks from 0 to 52
WITH
    weeks AS
        (   Select LEVEL - 1 "WEEK" From Dual  Connect By LEVEL <= 53   ),
  1. create cte summing the DURATION_MINUTES per weeks and fill in all the weeks data
    minutes as
        (   Select  DISTINCT y.CY, t.RC,
                    To_Number(CASE WHEN w.WEEK - 3 < 0 THEN Nvl(t.CY, y.CY) - 1 ELSE Nvl(t.CY, y.CY) END) ||
                            LPAD(CASE WHEN w.WEEK - 3 < 0 THEN 53 + w.WEEK - 4 ELSE w.WEEK - 3 END, 2, '0') "WEEK_FROM",
                    To_Number(y.CY || LPAD(w.WEEK, 2, '0')) "WEEK",
                    Sum(CASE WHEN To_Number(t.CY || LPAD(t.WEEK, 2, '0')) = To_Number(y.CY || LPAD(w.WEEK, 2, '0')) 
                             THEN t.DURATION_MINUTES 
                        ELSE 0 END) OVER(Partition By t.RC, y.CY, w.WEEK) "WEEK_MINS"
            From weeks w
            Inner Join (Select CY From MY_TABLE Group By CY ORDER BY CY) y ON(1=1)
            Left Join  MY_TABLE t ON(1 = 1)
            ), 
  1. create cte that will do the 4 week window sums of the above data and set a WHERE condition for the period you want. Without it you will get (with this sample data) 53 rows (looks interesting) where you can see when the durations occured.
    grid AS
        (  Select  CY, RC, To_Number(WEEK_FROM) "WEEK_FROM", To_Number(WEEK) "WEEK",
                      Sum(WEEK_MINS) 
                          OVER(Partition By RC Order By WEEK Rows Between 3 Preceding And Current Row) "FOUR_WEEK_MINUTES",
                      0 as HCM, 0 as PALPHABET, 0 as PBI, 0 as SCM, 0 as PBU, 0 as PDT, 0 as PIO, 0 as POTHER, 0 as PWC, 0 as TBD
            From   minutes
            WHERE    WEEK - WEEK_FROM IN(3, 51)   And
                     CY = 2022 And To_Number(SubStr(WEEK, 5, 2)) Between 6 And 11 
        )
  1. Pivot the result
SELECT  *
FROM (   Select    WEEK "WEEK", g.RC "RC", g.FOUR_WEEK_MINUTES "FOUR_WEEK_MINUTES"
        From      grid g
     )
    PIVOT ( SUM(FOUR_WEEK_MINUTES / 4)  FOR RC IN ('HCM' AS HCM, 'PALPHABET' AS PALPHABET, 'PBI' AS PBI, 'SCM' AS SCM, 'PBU' AS PBU,
                                                   'PDT' AS PDT, 'PIO' AS PIO, 'POTHER' AS POTHER, 'PWC' AS PWC, 'TBD' AS TBD )
          )
ORDER BY WEEK

... and the resulting dataset should be

      WEEK        HCM  PALPHABET        PBI        SCM        PBU        PDT        PIO     POTHER        PWC        TBD
---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
    202206       21.5          0      22.75          0          0          0          0          0          0          0 
    202207        109          0       69.5          0          0          0          0      12.25          0          1 
    202208     147.25          0      254.5          0          0          0          0     142.25     124.75      38.75 
    202209     214.25         32        268          0          0         23          0     142.25        142      38.75 
    202210     269.75         32     245.25          0      52.25         23      187.5     142.25        142      85.75 
    202211     182.25         32      198.5      24.25     112.25         23      191.5        130        142      84.75

Didn't test it for periods envolving cross years, but this should proces the data from previous year if it is within a 4 week window.
With current sample data to see the rows from before the 4 weeks window and after it - put the period limitiing the weeks between 5 and 15 to see all affected rows - result is:

      WEEK        HCM  PALPHABET        PBI        SCM        PBU        PDT        PIO     POTHER        PWC        TBD
---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
    202205          0          0          0          0          0          0          0          0          0          0 
    202206       21.5          0      22.75          0          0          0          0          0          0          0 
    202207        109          0       69.5          0          0          0          0      12.25          0          1 
    202208     147.25          0      254.5          0          0          0          0     142.25     124.75      38.75 
    202209     214.25         32        268          0          0         23          0     142.25        142      38.75 
    202210     269.75         32     245.25          0      52.25         23      187.5     142.25        142      85.75 
    202211     182.25         32      198.5      24.25     112.25         23      191.5        130        142      84.75 
    202212        144         32       13.5      24.25     112.25         23      191.5          0      17.25         47 
    202213         77          0          0      24.25     112.25          0      191.5          0          0         47 
    202214          0          0          0      24.25         60          0          4          0          0          0 
    202215          0          0          0          0          0          0          0          0          0          0
d r
  • 3,848
  • 2
  • 4
  • 15