5

Consider a table like with the following data

column_a (boolean) | column_order (integer)
TRUE               |     1
NULL               |     2
NULL               |     3
TRUE               |     4
NULL               |     5
FALSE              |     6
NULL               |     7

I would like to write a queries that replaces each NULL value in column_a with the last non-NULL value out of the previous values of the column according to the order specified by column_order The result should look like:

column_a (boolean) | column_order (integer)
TRUE               |     1
TRUE               |     2
TRUE               |     3
TRUE               |     4
TRUE               |     5
FALSE              |     6
FALSE              |     7

For simplicity, we can assume that the first value is never null. The following works if there are no more than one consecutive NULL values:

SELECT
  COALESCE(column_a, lag(column_a) OVER (ORDER BY column_order))
FROM test_table
ORDER BY column_order;

However, the above does not work for an arbitrary number of consecutive NULL values. What is a Postgres query that is able to achieve the results above? Is there an efficient query that scales well to a large number of rows?

Marco
  • 3,053
  • 5
  • 27
  • 29

5 Answers5

3

You can use a handy trick where you sum over a case to create partitions based on the divisions between null and non-null series, then first_value to bring them forward.

e.g.

select
  *,
  sum(case when column_a is not null then 1 else 0 end)
    OVER (order by column_order) as partition
from table1;

 column_a | column_order | partition 
----------+--------------+-----------
 t        |            1 |         1
          |            2 |         1
          |            3 |         1
 t        |            4 |         2
          |            5 |         2
 f        |            6 |         3
          |            7 |         3
(7 rows)

then

select
  first_value(column_a)
    OVER (PARTITION BY partition ORDER BY column_order),
  column_order
from (
    select
      *,
      sum(case when column_a is not null then 1 else 0 end)
        OVER (order by column_order) as partition
    from table1
) partitioned;

gives you:

 first_value | column_order 
-------------+--------------
 t           |            1
 t           |            2
 t           |            3
 t           |            4
 t           |            5
 f           |            6
 f           |            7
(7 rows)
Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
1

Not sure if Postgresql supports this, but give it a try:

SELECT
  COALESCE(column_a, (select t2.column_a from test_table t2
                      where t2.column_order < t1.column_order
                        and t2.column_a is not null
                      order by t2.column_order desc
                      fetch first 1 row only))
FROM test_table t1
ORDER BY column_order;
jarlh
  • 42,561
  • 8
  • 45
  • 63
1

I'm more familiar with SqlServer, but this should do what you need.

update  tableA as a2
set column_a = b2.column_a
from (
  select a.column_order, max(b.column_order) from tableA as a
  inner join tableA as b on a.column_order > b.column_order and b.column_a is not null
  where a.column_a is null
  group by a.column_order
) as junx 
inner join tableA as b2 on junx.max =b2.column_order
where a2.column_order = junx.column_order

SQL Fiddle

Kevin
  • 7,162
  • 11
  • 46
  • 70
0

Use this:

select 
   case when column_a is null then 
      (select top 1 column_a from myTable where column_order < mt.column_order
         AND column_a is not null 
       order by column_order desc)
   else column_a
   end,
   column_order
   from myTable mt

here is a fiddle for it.

Taher Rahgooy
  • 6,528
  • 3
  • 19
  • 30
0

Below query is working on Postgresql. You should think about adding an index on column_order otherwise execution might be very slow.

EDIT: on a 100.000 records table, it took 3661 ms while "select * from test_table" took 2511 ms. Performance is very good when you put index on column_a. (statistics are taken on client side)

select 
    case
        when a.column_a is null then b.column_a
        else a.column_a
    end,
    a.column_order
from test_table a
left join test_table b on b.column_order = (
        select max(column_order) 
        from test_table c 
        where c.column_order < a.column_order and c.column_a is not null)
order by column_order
Abdullah Nehir
  • 1,027
  • 13
  • 23