Recursive snowflake query of monthly snapshots for time series analysis dropping records prior to change for users who changed department

Question

This is a successor question to this question which explains the objective of this query and provides a sample of the source data.

With help, I have this recursive query running which is more efficient than my non-recursive query, repeated 36 times and unioned together.

The purpose of this query is to know which department an employee was in at the end of each month. The problem with this code is that for employees who changed departments, it is returning only the month end department value for months subsequent to the most recent department change, and no prior records. For employees who changed departments, the output should contain this data:

Month - Department Code
0 - 100
1 - 100
2 - 200
3 - 200

And it is currently returning:

Month - Department Code
0 - 100
1 - 100

Here is the query:

WITH Q AS (
    select 
        row_number() over(order by null) as q_level,
        last_day(dateadd(month, -q_level, CURRENT_DATE), month) as last_day_month
    from table(generator(ROWCOUNT=>36))
), Q1 AS (
    select 
        q.q_level
        ,q.last_day_month
        ,v_dept_history_adj.associate_id             
        ,v_dept_history_adj.home_department_code
        ,v_dept_history_adj.position_effective_date
        ,max(position_effective_date) OVER(PARTITION BY v_dept_history_adj.associate_id) AS most_recent_record 
    from datawarehouse.srctable
        ,Q
    where v_dept_history_adj.position_effective_date <= q.last_day_month
)
select 
    associate_id
    ,position_effective_date
    ,home_department_code
    ,most_recent_record
    ,last_day_month AS month
FROM Q1
where position_effective_date = most_recent_record
order by month desc, position_effective_date desc

Your query involved more columns and tables than what your sample data indicates so it's very difficult to tell what exactly you're trying to achieve. Just give us 10-20 rows of sample data and the expected output for it. And explain the business logic involved. It's often easier to build a solution than to debug an existing solution that doesn't work, and more more importantly is difficult to understand. Here's an example question for inspiration https://stackoverflow.com/questions/27680999/resetting-row-number-according-to-record-data-change — Radagast, Nov 18 '22 at 16:51

score 0 · Answer 1 · answered Nov 21 '22 at 00:51

So no that the larger picture of your questions makes sense:

To get the most resent department per month for each employee, I would write this query like so:

with emp_data(emp_id, dep_id, date) as (
    select * from values
    (1, 10, '2022-01-01'::date),
    (1, 20, '2022-07-10'::date),
    (2, 10, '2022-07-14'::date)
), last_36_months as (
    select 
        row_number() over(order by null) as q_level,
        last_day(dateadd(month, -q_level, CURRENT_DATE), month) as last_day_month
    --from table(generator(ROWCOUNT=>36))
    from table(generator(ROWCOUNT=>12))
), month_end_data as (
    select 
        e.emp_id
        ,e.dep_id
        ,l.last_day_month as month
    from last_36_months as l
    join emp_data as e
        on e.date <= l.last_day_month
    qualify row_number() over(partition by e.emp_id, l.last_day_month order by e.date desc) = 1
)
select * 
from month_end_data
order by 1,3 desc;

I reduced 36 to 12, and moved the data to 2022 so the output was less verbose, but it gives:

EMP_ID	DEP_ID	MONTH
1	20	2022-10-31
1	20	2022-09-30
1	20	2022-08-31
1	20	2022-07-31
1	10	2022-06-30
1	10	2022-05-31
1	10	2022-04-30
1	10	2022-03-31
1	10	2022-02-28
1	10	2022-01-31
2	10	2022-10-31
2	10	2022-09-30
2	10	2022-08-31
2	10	2022-07-31

which seems more aligned to what you want, and simpler to read

this works! It is a thing of beauty. Thank you! – Gavin Nov 23 '22 at 20:22 — Gavin, Nov 23 '22 at 20:22

Recursive snowflake query of monthly snapshots for time series analysis dropping records prior to change for users who changed department

1 Answers1