5

Does anyone know how to replace nulls in a column with a string until it hits a new string then that string replaces all null values below it? I have a column that looks like this

Original Column:

PAST_DUE_COL           
91 or more days pastdue        
Null
Null
61-90 days past due          
Null
Null
31-60 days past due
Null
0-30 days past due
Null       
Null
Null            

Expected Result Column:

PAST_DUE_COL           
91 or more days past due        
91 or more days past due
91 or more days past due
61-90 days past due          
61-90 days past due 
61-90 days past due 
31-60 days past due
31-60 days past due
0-30 days past due
0-30 days past due      
0-30 days past due
0-30 days past due

Essentially I want the first string in the column to replace all null values below it until the next string. Then that string will replace all nulls below it until the next string and so on.

GMB
  • 216,147
  • 25
  • 84
  • 135
Ryan
  • 87
  • 1
  • 1
  • 6
  • 2
    You need a column to `order` the records. – GMB Feb 07 '20 at 00:52
  • Hey thank you for pointing that out I wasn't entirely sure how that worked but I went through my old answers and checked them off. Everyone on here has been so helpful I want to make sure there is credit when credit is due. – Ryan Feb 07 '20 at 23:21

4 Answers4

11

SQL Server 2022 added support for the ignore nulls option in window functions: yay!

We can just use last_value():

select t.*,
    last_value(past_due_col) 
        ignore nulls 
        over(order by id) new_past_due_col
from mytable t

Demo on DB Fiddle.


In earlier versions (< 2022), where SQL Server window functions do not support the ignore nulls option, we can work around with some gaps and island technique:

select t.*, max(past_due_col) over(partition by grp) new_past_due_col
from (
    select t.*, count(past_due_col) over(order by id) grp
    from mytable t
) t

The subquery does a window count that increments every time a non null value is found: this defines groups of rows that contain a non-null value followed by null values.

Then, the outer uses a window max() to retrieve the (only) non-null value in each group.

This assumes that a column can be used to order the records (I called it id).

Demo on DB Fiddle:

ID | PAST_DUE_COL            | grp | new_past_due_col       
-: | :---------------------- | --: | :----------------------
 1 | 91 or more days pastdue |   1 | 91 or more days pastdue
 2 | null                    |   1 | 91 or more days pastdue
 3 | null                    |   1 | 91 or more days pastdue
 4 | 61-90 days past due     |   2 | 61-90 days past due    
 5 | null                    |   2 | 61-90 days past due    
 6 | null                    |   2 | 61-90 days past due    
 7 | 31-60 days past due     |   3 | 31-60 days past due    
 8 | null                    |   3 | 31-60 days past due    
 9 | 0-30 days past due      |   4 | 0-30 days past due     
10 | null                    |   4 | 0-30 days past due     
11 | null                    |   4 | 0-30 days past due     
12 | null                    |   4 | 0-30 days past due     
GMB
  • 216,147
  • 25
  • 84
  • 135
3

This is a variation on GMBs answer. It is just a bit simpler:

select t.*,
       max(past_due_col) over(partition by grp) as new_past_due_col
from (select t.*,
             count(past_due_col) over (order by id) as grp
      from mytable t
     ) t;

Note that you need an ordering column of some sort for your question to even make sense.

Another approach uses apply:

select t.*, t2.past_due_col
from mytable t outer apply
     (select top (1) t2.*
      from mytable t2
      where t2.id <= t.id and t2.past_due_col is not null
      order by t2.id desc
     ) t2;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

If you have an id column and lead/lag is not available you could use:

  SELECT (select top 1 PAST_DUE_COL from MyTablename 
    where id <= t.id and PAST_DUE_COL <> '' order by id desc)
    FROM MyTablename T
JJ32
  • 1,034
  • 1
  • 7
  • 24
-1

If Input Data is not ordered like below- then the below solution will break i.e.

PAST_DUE_COL           
61-90 days past due          
Null
Null
91 or more days pastdue   **shifted the row starting with 91 below 61**  
Null
Null
31-60 days past due
Null
0-30 days past due
Null       
Null
Null  

Adding the magic line to make this query work generic - irrespective of the source data order:

USE stackflow_db;
WITH order_cte AS
( 
        SELECT  
                    * , ROW_NUMBER() OVER ( ORDER BY ( SELECT 1 )) AS  magic_id
        FROM  
                mytable
),
group_data AS 
(
SELECT 
        t.*,
        SUM(CASE WHEN past_due_col IS NULL THEN 0 ELSE 1 END)   
            OVER(ORDER BY magic_id) AS grp
    FROM order_cte 
)
SELECT
    t.*,
    MAX(past_due_col) OVER(PARTITION BY grp) AS new_past_due_col
FROM 
    group_data;
desertnaut
  • 57,590
  • 26
  • 140
  • 166