Replace NULL with values

Question

Here is my challenge: I have a log table which every time a record is changed adds a new record but puts a NULL value for each non-changed value in each record. In other words only the changed value is set, the rest unchanged fields in each row simply has a NULL value. Now I would like to replace each NULL value with the value above it that is NOT a NULL value like below:

Source table: Task_log

ID  Owner       Status      Flag
1   Bob         Registrar   T
2   Sue         NULL        NULL
3   NULL        NULL        F
4   Frank       Admission   T
5   NULL        NULL        F
6   NULL        NULL        T

Desired output table: Task_log

ID  Owner       Status      Flag
1   Bob         Registrar   T
2   Sue         Registrar   T
3   Sue         Registrar   F
4   Frank       Admission   T
5   Frank       Admission   F
6   Frank       Admission   T

How do I write a query which will generate the desired output table?

Why are you interesting in writing a query to fix the junk data instead up fixing the code that actually inserts the record? This is a bad time to use a bandaid... Fix the root problem. — Abe Miessler, May 09 '14 at 21:26
I am using SQL Server 2012. I have no access to the code hence I cannot fix the root problem hence I have to be creative writing a cool query which will produce the desired output. — Flemming Hald, May 09 '14 at 21:28
I haven't used them myself, so I'm not comfortable yet writing out a full answer, but take a look at **[LAG](http://technet.microsoft.com/en-us/library/hh231256.aspx)**, **[LEAD](http://technet.microsoft.com/en-us/library/hh213125.aspx)**, and **[LAST_VALUE](http://technet.microsoft.com/en-us/library/hh231517.aspx)** (Require Sql Server 2012 and later) — Joel Coehoorn, May 09 '14 at 21:54
possible duplicate of [replace NULL values with latest non-NULL value in resultset series (SQL Server 2008 R2)](http://stackoverflow.com/questions/7045040/replace-null-values-with-latest-non-null-value-in-resultset-series-sql-server-2) — Serpiton, May 09 '14 at 22:04

score 2 · Accepted Answer · answered May 09 '14 at 22:28

One the new windowed function of SQLServer 2012 is FIRST_VALUE, wich have quite a direct name, it can be partitioned through the OVER clause, before using it is necessary to divide every column in data block, a block for a column begin when a value is found.

With Block As (
  Select ID
       , Owner
       , OBlockID = SUM(Case When Owner Is Null Then 0 Else 1 End) 
                    OVER (ORDER BY ID)
       , Status
       , SBlockID = SUM(Case When Status Is Null Then 0 Else 1 End) 
                    OVER (ORDER BY ID)
       , Flag
       , FBlockID = SUM(Case When Flag Is Null Then 0 Else 1 End) 
                    OVER (ORDER BY ID)
  From   Task_log
)
Select ID
     , Owner = FIRST_VALUE(Owner) OVER (PARTITION BY OBlockID ORDER BY ID)
     , Status = FIRST_VALUE(Status) OVER (PARTITION BY SBlockID ORDER BY ID)
     , Flag = FIRST_VALUE(Flag) OVER (PARTITION BY FBlockID ORDER BY ID)
FROM   Block

SQLFiddle demo

The UPDATE query is easily derived

You don't need `FIRST_VALUE`, a simple `MIN(Owner) OVER (PARTITION BY OBlockID)` also works. And it would be really simple if SQL Server supported the Standard SQL `IGNORE NULLS` option for `LEAD` — dnoeth, May 10 '14 at 11:36

score 0 · Answer 2 · answered May 09 '14 at 21:31

As I mentioned in my comment, I would try to fix the process that is creating the records rather than fixing the junk data. If that is not an option, the code below should get you pointed in the right direction.

UPDATE   t1
set      t1.owner = COALESCE(t1.owner, t2.owner),
         t1.Status = COALESCE(t1.status, t2.status),
         t1.Flag = COALESCE(t1.flag, t2.flag)
FROM     Task_log as t1
INNER JOIN    Task_log as t2
         ON t1.id = (t1.id + 1)
where    t1.owner is null 
         OR t1.status is null 
         OR t1.flag is null

If the table has more then one line gap in the data the second line will still show `NULL`, for example the row with ID 6 in the OP data will be unchanged — Serpiton, May 09 '14 at 22:39

score 0 · Answer 3 · edited May 23 '17 at 12:05

I can think of several approaches.

You could use a combination of COALESCE with an array aggregate function. Unfortunately it doesn't look like SQL Server supports array_agg natively (although some nice people have developed some workarounds).

You could also use a subselect for each column.

SELECT id,
       (SELECT TOP 1 FROM (SELECT owner FROM ... WHERE id = outer_id AND owner IS NOT NULL order by ID desc )) AS owner,
       -- other columns

You could probably do something with window functions, too.

score 0 · Answer 4 · answered May 09 '14 at 23:47

0

A vanilla solution would be:

select id , owner , coalesce(owner, ( select owner from t t2 where id = (select max(id) from t t3 where id < t1.id and owner is not null)) ) as new_owner , flag , coalesce(flag, ( select flag from t t2 where id = (select max(id) from t t3 where id < t1.id and flag is not null)) ) as new_flag from t t1

Rather inefficient, but should work on most DBMS

answered May 09 '14 at 23:47

Lennart - Slava Ukraini

6,936
1
20
32

Lennart you pointed me in the right direction. This solution worked for me: – Flemming Hald Jun 18 '14 at 14:01
you pointed me in the right direction. This solution worked for me: declare @a numeric(3) declare @b numeric(30) declare @c varchar(20) declare @d varchar(1) ;WITH q AS ( select top 10000000* from [TABLE_NAME] order by [FIELD1, FIELD2] ) update q set @a = coalesce(id, @a), -- set status_id variable @b = coalesce(owner_id, @b), @c = coalesce(status_id, @c), @d = coalesce(student_read_flag, @d), id = coalesce(id, @a), owner_id = coalesce(owner_id, @b), status_id = coalesce(assigned_to_employee_id, @c), student_read_flag = coalesce(student_read_flag, @d) – Flemming Hald Jun 18 '14 at 14:20

Replace NULL with values

4 Answers4