Connect by prior for a data series

Question

I have a table with arrivals and exits from a system with its time period. I also have a forecast of arrivals and exits. I would like to compute the starting and ending count of heads for the incomplete forecasted periods (ideally using merge and connect by).

Data to reproduce:

create table HEAD_COUNT 
(
  PERIOD_START DATE,
  HEAD_COUNT_START NUMBER ,
  HEAD_COUNT_END NUMBER ,
  ARRIVAL NUMBER ,
  EXITS NUMBER
);

Insert into HEAD_COUNT  values ('01-DEC-18',0,0,13,275);
Insert into HEAD_COUNT  values ('01-NOV-18',0,0,0,46);
Insert into HEAD_COUNT  values ('01-OCT-18',0,0,6,61);
Insert into HEAD_COUNT  values ('01-SEP-18',0,0,275,1292);
Insert into HEAD_COUNT  values ('01-AUG-18',0,0,46,1790);
Insert into HEAD_COUNT  values ('01-JUL-18',0,0,61,17);
Insert into HEAD_COUNT  values ('01-JUN-18',0,0,1292,3);
Insert into HEAD_COUNT  values ('01-MAY-18',0,0,1790,15);
Insert into HEAD_COUNT  values ('01-APR-18',0,0,17,158);
Insert into HEAD_COUNT  values ('01-MAR-18',0,0,3,9);
Insert into HEAD_COUNT  values ('01-FEB-18',0,0,15,0);
Insert into HEAD_COUNT  values ('01-JAN-18',0,0,158,4);
Insert into HEAD_COUNT  values ('01-DEC-17',0,0,9,179);
Insert into HEAD_COUNT  values ('01-NOV-17',250,186,0,64);
Insert into HEAD_COUNT  values ('01-OCT-17',276,250,4,30);
Insert into HEAD_COUNT  values ('01-SEP-17',638,276,179,541);

Ending Headcount of a period = starting head count + arrivals -exits.
Starting headcount of a period = ending headcount of the previous period

My initial attempt:

merge INTO head_count h USING
(SELECT period_start,
    head_count_start,
    head_count_end,
    arrival,
    exits
  FROM head_count
    CONNECT BY prior period_start = add_months(period_start, -1)
    START WITH period_start       = TRUNC(sysdate, 'Month')
)
src ON (h.period_start = src.period_start)
WHEN matched THEN
  UPDATE
  SET h.head_count_start = src.head_count_end,
    h.head_count_end     = h.head_count_start + h.arrival - h.exits

However, it is not producing the right results. Appreciate any help!

MERGE is certainly right, but why "preferably using ... connect by"? That doesn't make sense - analytic function were introduced exactly for this kind of task, and they are in the SQL Standard so they should work outside Oracle too (unlike CONNECT BY). Then - are any of the columns really "not null"? If so, it would be much better to include that in the table definition, so queries don't have to work around potential nulls when in fact you know beforehand there won't be any. — , Jan 01 '18 at 19:25

score 2 · Answer 1 · answered Jan 01 '18 at 08:31

Try Oracle's recursive subquery factoring clause instead of CONNECT BY. It requires some extra typing but in my experience the syntax makes more sense.

For example:

--Update the HEAD_COUNT using the last arrival and exits.
merge into head_count using
(
    --Summarize head count start and end.
    with head_count_cte(period_start, head_count_start, head_count_end, arrival, exits) as
    (
        --First month.
        select period_start, head_count_start, head_count_start + arrival - exits head_count_end, arrival, exits
        from head_count
        where period_start = (select min(period_start) from head_count)
        union all
        --All months after the first month.
        select
            head_count.period_start,
            head_count_cte.head_count_end as head_count_start,
            head_count_cte.head_count_end + head_count.arrival - head_count.exits as head_count_end,
            head_count.arrival, head_count.exits
        from head_count_cte
        join head_count
            on add_months(head_count_cte.period_start, 1) = head_count.period_start
    )
    select *
    from head_count_cte
    order by period_start desc
) recursive_head_count
    on (head_count.period_start = recursive_head_count.period_start)
when matched then update set
    head_count.head_count_start = recursive_head_count.head_count_start,
    head_count.head_count_end   = recursive_head_count.head_count_end;

Here's a complete SQL Fiddle with all the data and results.

Works great! Thank you Jon! – James Paul Jan 01 '18 at 09:50 — James Paul, Jan 01 '18 at 09:50

score 1 · Answer 2 · answered Jan 01 '18 at 12:33

Recursion can be replaced by analytic functions, this should work a little faster:

Demo: http://sqlfiddle.com/#!4/87500c/2

select t.* ,
       first_value( HEAD_COUNT_START ) over (order by PERIOD_START )
       +
       coalesce(sum( ARRIVAL - EXITS) over (order by PERIOD_START 
            ROWS between unbounded preceding and 1 preceding ),0) as Starting_headcount,

       first_value( HEAD_COUNT_START ) over (order by PERIOD_START )
       +
       sum( ARRIVAL - EXITS) over (order by PERIOD_START ) as Ending_Headcount

from HEAD_COUNT t
order by period_start;

|         PERIOD_START | HEAD_COUNT_START | HEAD_COUNT_END | ARRIVAL | EXITS | STARTING_HEADCOUNT | ENDING_HEADCOUNT |
|----------------------|------------------|----------------|---------|-------|--------------------|------------------|
| 2017-09-01T00:00:00Z |              638 |            276 |     179 |   541 |                638 |              276 |
| 2017-10-01T00:00:00Z |              276 |            250 |       4 |    30 |                276 |              250 |
| 2017-11-01T00:00:00Z |              250 |            186 |       0 |    64 |                250 |              186 |
| 2017-12-01T00:00:00Z |                0 |              0 |       9 |   179 |                186 |               16 |
| 2018-01-01T00:00:00Z |                0 |              0 |     158 |     4 |                 16 |              170 |
| 2018-02-01T00:00:00Z |                0 |              0 |      15 |     0 |                170 |              185 |
| 2018-03-01T00:00:00Z |                0 |              0 |       3 |     9 |                185 |              179 |
| 2018-04-01T00:00:00Z |                0 |              0 |      17 |   158 |                179 |               38 |
| 2018-05-01T00:00:00Z |                0 |              0 |    1790 |    15 |                 38 |             1813 |
| 2018-06-01T00:00:00Z |                0 |              0 |    1292 |     3 |               1813 |             3102 |
| 2018-07-01T00:00:00Z |                0 |              0 |      61 |    17 |               3102 |             3146 |
| 2018-08-01T00:00:00Z |                0 |              0 |      46 |  1790 |               3146 |             1402 |
| 2018-09-01T00:00:00Z |                0 |              0 |     275 |  1292 |               1402 |              385 |
| 2018-10-01T00:00:00Z |                0 |              0 |       6 |    61 |                385 |              330 |
| 2018-11-01T00:00:00Z |                0 |              0 |       0 |    46 |                330 |              284 |
| 2018-12-01T00:00:00Z |                0 |              0 |      13 |   275 |                284 |               22 |

And then the above query can be used as a subquery in the MERGE statement:

MERGE INTO HEAD_COUNT h
USING(
  select t.* ,
       first_value( HEAD_COUNT_START ) over (order by PERIOD_START )
       +
       coalesce(sum( ARRIVAL - EXITS) over (order by PERIOD_START 
            ROWS between unbounded preceding and 1 preceding ),0) as Starting_headcount,

       first_value( HEAD_COUNT_START ) over (order by PERIOD_START )
       +
       sum( ARRIVAL - EXITS) over (order by PERIOD_START ) as Ending_Headcount

  from HEAD_COUNT t
) t
ON (t.PERIOD_START = h.PERIOD_START )
WHEN MATCHED THEN UPDATE
   SET h.HEAD_COUNT_START = t.Starting_headcount,
       h.HEAD_COUNT_END = t.Ending_Headcount

Demo: http://sqlfiddle.com/#!4/34268/1

|         PERIOD_START | HEAD_COUNT_START | HEAD_COUNT_END | ARRIVAL | EXITS |
|----------------------|------------------|----------------|---------|-------|
| 2017-09-01T00:00:00Z |              638 |            276 |     179 |   541 |
| 2017-10-01T00:00:00Z |              276 |            250 |       4 |    30 |
| 2017-11-01T00:00:00Z |              250 |            186 |       0 |    64 |
| 2017-12-01T00:00:00Z |              186 |             16 |       9 |   179 |
| 2018-01-01T00:00:00Z |               16 |            170 |     158 |     4 |
| 2018-02-01T00:00:00Z |              170 |            185 |      15 |     0 |
| 2018-03-01T00:00:00Z |              185 |            179 |       3 |     9 |
| 2018-04-01T00:00:00Z |              179 |             38 |      17 |   158 |
| 2018-05-01T00:00:00Z |               38 |           1813 |    1790 |    15 |
| 2018-06-01T00:00:00Z |             1813 |           3102 |    1292 |     3 |
| 2018-07-01T00:00:00Z |             3102 |           3146 |      61 |    17 |
| 2018-08-01T00:00:00Z |             3146 |           1402 |      46 |  1790 |
| 2018-09-01T00:00:00Z |             1402 |            385 |     275 |  1292 |
| 2018-10-01T00:00:00Z |              385 |            330 |       6 |    61 |
| 2018-11-01T00:00:00Z |              330 |            284 |       0 |    46 |
| 2018-12-01T00:00:00Z |              284 |             22 |      13 |   275 |

This is definitely better than any recursive approach. What is missing, IMO, is a WHERE condition after "WHEN MATCHED", to ensure that only the rows that really need to be updated are updated. Especially in the long run, if the table becomes large but only a few rows representing "the future" must be updated, it will be important to not update a large number of rows to themselves. (That causes large and unnecessary overhead - unneeded undo and redo activity in the database). — , Jan 01 '18 at 19:23

Connect by prior for a data series

2 Answers2