What is wrong with this running total (cumulative column) in Oracle select with a window function?

Question

I have a query containing 0 or 1 in a column. For demonstration purposes I replaced the 1 with 77 so its more easy to see:

select dates.d the_date
, case TO_CHAR(dates.d, 'd') when '7' then 0 when '1' then 0 
    else 77 
  end as is_workday
from (SELECT (to_date('01.01.2019','dd.mm.yyyy') + (LEVEL -1)) 
      AS d FROM DUAL connect by level <=( 
      to_date('31.12.2020','dd.mm.yyyy')-(to_date('01.01.2019','dd.mm.yyyy')))
     ) as dates

The result is simply a consecutive date column and a marker if this date is a workday or not. (in real life I do a holiday calculation as well, but that's not the problem):

| THE_DATE            | IS_WORKDAY |
| 2019-01-01 00:00:00 | 77         |
| 2019-01-02 00:00:00 | 77         |
| 2019-01-03 00:00:00 | 77         |
| 2019-01-04 00:00:00 | 77         |
| 2019-01-05 00:00:00 |  0         |
| 2019-01-06 00:00:00 |  0         |
| 2019-01-07 00:00:00 | 77         |
| 2019-01-08 00:00:00 | 77         |
| 2019-01-09 00:00:00 | 77         |
| 2019-01-10 00:00:00 | 77         |
....

I want to add a running total over is_workday, meaning a cumulative value. I am sure Oracles window functions are made for this.

SELECT x.the_date
, x.is_workday
, sum(x.is_workday) over (
    partition by x.the_date          -- define the window
    order by x.the_date asc          -- order inside window
    rows between unbounded preceding -- sum to top
    and current row                  -- sum ending here
  ) as workdays_cumul
FROM (
  select dates.d the_date
  , case TO_CHAR(dates.d, 'd') when '7' then 0 when '1' then 0 
      else 77 
    end as is_workday
  from (SELECT (to_date('01.01.2019','dd.mm.yyyy') + (LEVEL -1)) 
        AS d FROM DUAL connect by level <=( 
        to_date('31.12.2020','dd.mm.yyyy')-(to_date('01.01.2019','dd.mm.yyyy')))
       ) as dates
) x
order by x.the_date
;

But I must miss something here, because I do not get a running total, but just the value itself.

| THE_DATE            | IS_WORKDAY | WORKDAYS_CUMUL |
| 2019-01-01 00:00:00 |  77        |  77            |
| 2019-01-02 00:00:00 |  77        |  77            |
| 2019-01-03 00:00:00 |  77        |  77            |
| 2019-01-04 00:00:00 |  77        |  77            |
| 2019-01-05 00:00:00 |   0        |   0            |
| 2019-01-06 00:00:00 |   0        |   0            |
| 2019-01-07 00:00:00 |  77        |  77            |
| 2019-01-08 00:00:00 |  77        |  77            |
....

Obviously it should be:

| THE_DATE            | IS_WORKDAY | WORKDAYS_CUMUL |
| 2019-01-01 00:00:00 |  77        |   77           |
| 2019-01-02 00:00:00 |  77        |  154           |
| 2019-01-03 00:00:00 |  77        |  231           |
...

I thought it would wirk like this:

sum(x.is_workday) -- do the sum over the 77-values
partition by x.the_date -- make windows/sections/parts with one row each (in my case)
order by x.the_date asc -- order those rows by date
rows between unbounded preceding -- sum between very first row...
and current row -- ... and the current row.

What am I missing here?

Partitioning by the date sums the value for each date (basically group by date). Remove it. — JohnHC, May 23 '19 at 09:25
@JohnHC ohh.... so I have to define "one big window" instead of "a lot of small windows"... I see! replacing with `partition by 1 order by x.the_date asc` seems do do the trick! Thanks, you can make an answer of that. Is `partition by 1` the proper way of "removing it"? — towi, May 23 '19 at 09:29
Not related to your issue, but `TO_CHAR(dates.d, 'd')` is [dependant on your NLS settings](https://docs.oracle.com/en/database/oracle/oracle-database/12.2/sqlrf/Format-Models.html#GUID-E118F121-A0E1-4784-A685-D35CE64B4557); you'll get a different answer running this in [sessions saying they are in] the UK and USA for example, as the week starts on different days. — Alex Poole, May 23 '19 at 09:29
@AlexPoole Thanks. I did find a solution with `J` and `MOD 7` and wondered why that is. I wish oracle would mention that easy to spot in their docs at format `D`. -- https://stackoverflow.com/a/3451090/472245 — towi, May 23 '19 at 09:31
Well, I guess I've answered the first part of your comment on that answer *8-) The 'D' entry does say it depends on territory, but it isn't obvious what the effect is. Personally I use the fixed-language Sat/Sun check that question is asking about, but the 'J' calculation is interesting too - just less clear to future code maintainers maybe. — Alex Poole, May 23 '19 at 09:38
Actually `rows between unbounded preceding and current row` is the default, so you can simply write `SUM(x.is_workday) OVER (ORDER BY x.the_date)` — Wernfried Domscheit, May 23 '19 at 09:42
@WernfriedDomscheit Hell, now things fall into place for me. I didnt even realize that the quite simple `sum(x) over (y)` syntax even *is* a window function just with defaults. Now it all begins to make sense! — towi, May 23 '19 at 09:46

MT0 · Accepted Answer · 2019-05-23T10:12:10.563

Remove the PARTITION BY clause of your query as the scope you want for the window is the entire query and not each individual date:

SELECT the_date
,      is_workday
,      sum(is_workday) over (
         ORDER BY the_date asc            -- order inside window
         ROWS BETWEEN unbounded preceding -- sum to top
                  AND current row         -- sum ending here
       ) as workdays_cumul
FROM (
  select d the_date
  ,      CASE
         WHEN d - TRUNC( d, 'IW' ) IN ( 5, 6 )
         THEN 0
         ELSE 1 
         END AS is_workday
  FROM (
    SELECT DATE '2019-01-01' + LEVEL -1 AS d
    FROM   DUAL
    connect by level <= DATE '2020-12-31' - DATE '2019-01-01'
  )
)
order by the_date;

You can also use date literals, need to remove the AS in the table alias (and don't actually need any table aliases) and can use the difference between date and the start of its ISO week (d - TRUNC( d, 'IW' )) as a method of finding Saturday and Sunday that is independent of the NLS_TERRITORY session parameter.

You could also remove the ROWS BETWEEN clause as ROWS BETWEEN unbounded preceding AND current row is the default behaviour. However, if that is the behaviour you require then you may just as well leave it in to demonstrate that that window is your expected result.

Output:

THE_DATE  | IS_WORKDAY | WORKDAYS_CUMUL
:-------- | ---------: | -------------:
01-JAN-19 |          1 |              1
02-JAN-19 |          1 |              2
03-JAN-19 |          1 |              3
04-JAN-19 |          1 |              4
05-JAN-19 |          0 |              4
06-JAN-19 |          0 |              4
07-JAN-19 |          1 |              5
08-JAN-19 |          1 |              6
09-JAN-19 |          1 |              7
...
24-DEC-20 |          1 |            518
25-DEC-20 |          1 |            519
26-DEC-20 |          0 |            519
27-DEC-20 |          0 |            519
28-DEC-20 |          1 |            520
29-DEC-20 |          1 |            521
30-DEC-20 |          1 |            522

db<>fiddle here

What is wrong with this running total (cumulative column) in Oracle select with a window function?

1 Answers1