Update gaps in sequential table

Question

I have a table that contains employee bank data

Employee      |Bank          |Date           |Delta
---------------------------------------------------
Smith         |Vacation      |2023-01-01     |15.0
Smith         |Vacation      |2023-01-02     |Null
Smith         |Vacation      |2023-01-03     |Null
Smith         |Vacation      |2023-01-04     |7.5

I would like to write a statement so that I can update 2023-01-02 and 2023-01-03 with the Delta value from January 1. Essentially, I want to use the value from the most recent row that isn't > than the date on the row.

Once complete, I want the table to look like this:

Employee      |Bank          |Date           |Delta
---------------------------------------------------
Smith         |Vacation      |2023-01-01     |15.0
Smith         |Vacation      |2023-01-02     |15.0
Smith         |Vacation      |2023-01-03     |15.0
Smith         |Vacation      |2023-01-04     |7.5

The source table has a unique index consisting of Employee, Bank and Date descending. There could be up to 2 billion rows in the table.

I currently update the table with the following, but I am wondering if there is a more efficient way to do so?

WITH cte_date
     AS (SELECT dd.date_key, 
                db.balance_key, 
                feb.employee_key
         FROM shared.dim_date dd
              CROSS JOIN
         (
             SELECT DISTINCT 
                    employee_key
             FROM wfms.fact_employee_balance
         ) feb
              CROSS JOIN wfms.dim_balance db
         WHERE dd.date BETWEEN DATEFROMPARTS(DATEPART(YY, GETDATE()) - 2, 12, 31) AND GETDATE())
     SELECT dd.*, 
            t.delta
     INTO wfms.test2
     FROM cte_date dd
          LEFT JOIN wfms.test1 t ON dd.balance_key = t.balance_key
                                  AND dd.employee_key = t.employee_key
                                  AND t.date_key =  (SELECT TOP 1 tt1.date_key
                                                    FROM    wfms.test1 tt1
                                                    WHERE   tt1.balance_key = t.balance_key
                                                    AND     tt1.employee_key = t.employee_key
                                                    AND     tt1.date_key < dd.date_key);

What is your actual _database_ (for example SQL Server, Postgres, etc.) ? — Tim Biegeleisen, Feb 04 '23 at 01:51
Please see: https://stackoverflow.com/questions/16669620/how-to-get-previous-value-for-null-values — Tim Biegeleisen, Feb 04 '23 at 02:38

John Cappelletti · Answer 1 · 2023-02-04T03:28:08.057

Just for fun, I wanted to test an idea.

For the moment, lets assume the gaps are not too wide ... In this example 7 days.

On a relative to batch, the lag() over() approach was 22% while the Cross Apply was 78%.

Again, Just for fun

Select Employee
      ,Bank
      ,Date
      ,Delta = coalesce(A.Delta
                       ,lag(Delta,1) over (partition by Employee,Bank order by date)
                       ,lag(Delta,2) over (partition by Employee,Bank order by date)
                       ,lag(Delta,3) over (partition by Employee,Bank order by date)
                       ,lag(Delta,4) over (partition by Employee,Bank order by date)
                       ,lag(Delta,5) over (partition by Employee,Bank order by date)
                       ,lag(Delta,6) over (partition by Employee,Bank order by date)
                       ,lag(Delta,7) over (partition by Employee,Bank order by date)
                       )
 From  YourTable A

Versus

Select Employee
      ,Bank
      ,Date
      ,Delta = coalesce(A.Delta,B.Delta)
 From  YourTable A
 Cross Apply ( Select top 1 Delta 
                From  YourTable 
                Where Employee=A.Employee
                  and A.Bank = Bank
                  and Delta is not null
                  and A.Date>=Date
                 Order By Date desc
             ) B

Update

Same results with 20 days

score 0 · Accepted Answer · answered Feb 04 '23 at 09:45

Here is another way. Using sum() with window function to find the group "Grp" of rows (1 row with not null with subsequent rows of null). Finally max(Delta) of the Grp to return the not null value.

select  Employee, Bank, [Date], max  (max(Delta)) 
                                over (partition by Employee, Bank, Grp)
from 
(
    select  *, Grp = sum  (case when Delta is not null then 1 else 0 end) 
                     over (partition by Employee,Bank 
                               order by [Date]) 
    from    YourTable
) t
group by Employee, Bank, [Date], Grp

Thanks all who replied. The first reply wouldn't work in this case only because the number of days in between the gaps could be very long (up to 2 years) if the particular bank is not very volatile. — user8015860, Feb 06 '23 at 22:15

Update gaps in sequential table

2 Answers2