6

I have a time series that with null values. I want to be replace each null value with the most recent non-non value. From what I've researched, Oracle SQL can easily accomplish this using Last_value with IGNORE NULLS. Is there a similar way to accomplish this using SQL Server 2016? Otherwise I'm just going to code it using C#, but felt using SQL would be faster, cleaner, and easier.

Sec SCORE
1   Null
2   Null
3   5
4   Null
5   8
6   7
7   Null

Should be replaced with:

Sec SCORE
1   Null
2   Null
3   5
4   5
5   8
6   7
7   7
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
Trevor D
  • 63
  • 1
  • 1
  • 3
  • 2
    Possible duplicate of [How to get Previous Value for Null Values](https://stackoverflow.com/questions/16669620/how-to-get-previous-value-for-null-values) – ahmed abdelqader Jul 03 '17 at 21:22

2 Answers2

10

You can do this with two cumulative operations:

select t.*,
       coalesce(score, max(score) over (partition by maxid)) as newscore
from (select t.*,
             max(case when score is not null then id end) over (order by id) as maxid
      from t
     ) t;

The innermost subquery gets the most recent id where there is a value. The outermost one "spreads" that value to the subsequent rows.

If you actually want to update the table, you can incorporate this easily into an update. But, Oracle cannot do that (easily), so I'm guessing this is not necessary....

christiaantober
  • 251
  • 3
  • 10
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Genius! Does exactly what I need. However, I plan on doing this often and on millions of rows. It took about 1 minute to do the above on 3 million records. Would I get significant speed benefits if I switched to Oracle Express and used last_value with IGNORE NULLS? – Trevor D Jul 04 '17 at 23:48
  • @TrevorD . . . You would have to test that yourself. Of course, running two window functions takes longer than running one; I'm not sure if it worth the effort to switch databases for that gain. – Gordon Linoff Jul 05 '17 at 00:30
  • 1
    Damn, this is really good. Sadly I could never write such good SQL like this :@ – Basssprosse Jun 21 '21 at 11:48
  • It only taking max values not if there any min value before null, ideally it should capture before null values – Mayank Awasthi Sep 10 '21 at 15:25
  • @MayankAwasthi . . . This answers the question that was asked here. If you have your own question, then ask as a new question. And by the way, this is not calculating a cumulative max. Look at the data in the question and this does what the OP wants. – Gordon Linoff Sep 10 '21 at 20:52
3

If performance is an issue, I suggest the solution from this article:

The Last non NULL Puzzle

His final solution, while dense, does perform excellently with a linear query plan without any joins. Here is an example implementation I've used which carries the last customer name through a type2 scd staging table. In this staging table, NULL represents no update, and '*** DELETED ***' represents an explicit set to NULL. The following cleans this up to resemble an actual SCD record excellently:

WITH [SampleNumbered] AS (
    SELECT  *, ROW_NUMBER() OVER ( PARTITION BY [SampleId] ORDER BY [StartDatetime] ) AS [RowNumber]
    FROM [dbo].[SampleDimStage]
), [SamplePrep] AS (
    SELECT  [SampleId]
        ,   [StartDatetime]
        ,   CAST([RowNumber] AS BINARY(8)) + CAST([SampleGroupId] AS VARBINARY(255)) AS [BinarySampleGroupId]
        ,   CAST([RowNumber] AS BINARY(8)) + CAST([SampleStatusCode] AS VARBINARY(255)) AS [BinarySampleStatusCode]
    FROM [SampleNumbered]
), [SampleCleanUp] AS (
    SELECT  [SampleId]
        ,   [StartDatetime]
        ,   CAST(SUBSTRING(MAX([BinarySampleGroupId]) OVER( PARTITION BY [SampleId] ORDER BY [StartDatetime] )
                , 9, 255) AS VARCHAR(255)) AS [LastSampleGroupId]
        ,   CAST(SUBSTRING(MAX([BinarySampleStatusCode]) OVER( PARTITION BY [SampleId] ORDER BY [StartDatetime] )
                , 9, 255) AS VARCHAR(255)) AS [LastSampleStatusCode]
        ,   LEAD([StartDatetime]) OVER( PARTITION BY [SampleId] ORDER BY [StartDatetime] ) AS [EndDatetime]
    FROM [SamplePrep]
)
SELECT  CAST([SampleId] AS NUMERIC(18)) AS [SampleId]
    ,   CAST(NULLIF([sc].[LastSampleGroupId],'*** DELETED ***') AS NUMERIC(18)) AS [GroupId]
    ,   CAST(NULLIF([sc].[LastSampleStatusCode],'*** DELETED ***') AS CHAR(3)) AS [SampleStatusCode]
    ,   [StartDatetime]
    ,   [sc].[EndDatetime]
FROM [SampleCleanUp] [sc];

If your sort key is some sort of integer, you can completely skip the first CTE and cast that directly to binary.

Apneal
  • 483
  • 4
  • 6