2

How can I imitate the LOCF behavior induced by lag(x) ignore nulls on, e.g., Redshift, in Presto?

Take this sample data:

select * from (
  values (7369, null), (7499, 300), (7521, 500),
         (7566, null), (7654, 1400), (7698, null),
         (7782, null), (7788, null), (7839, null),
         (7844, 0), (7876, null), (7900, null),
         (7902, null), (7934, null)
) ex(empno, comm)

-- empno    comm
--  7369    
--  7499    300
--  7521    500
--  7566    
--  7654    1400
--  7698    
--  7782    
--  7788    
--  7839    
--  7844    0
--  7876    
--  7900    
--  7902    
--  7934    

Desired output is:

-- empno    comm     prev_comm
--  7369        
--  7499    300         
--  7521    500      300
--  7566             500
--  7654    1400     500
--  7698            1400
--  7782            1400
--  7788            1400
--  7839            1400
--  7844    0       1400
--  7876               0
--  7900               0
--  7902               0
--  7934               0

This can be nearly achieved by the following (adapted to Presto from here):

select empno, comm, max(comm) over (partition by grp) prev_comm
from ( 
  select empno, comm, sum(cast(comm is not null as double)) over (order by empno) grp
  from example_table
)
order by empno

-- empno    comm     prev_comm
--  7369        
--  7499    300      300
--  7521    500      500
--  7566             500
--  7654    1400    1400
--  7698            1400
--  7782            1400
--  7788            1400
--  7839            1400
--  7844    0          0
--  7876               0
--  7900               0
--  7902               0
--  7934               0

(the difference being that the current rows for non-NULL comm are incorrect)

Actually, in my case, the difference doesn't matter, since I want to coalesce(comm, prev_comm). However, this answer still does not suffice, because in the full data set, it created a memory failure:

Query exceeded local memory limit of 20GB

The following outstanding pull request to presto would implement ignore nulls directly; is there no way to accomplish the equivalent result in the interim?

https://github.com/prestodb/presto/pull/6157

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198

0 Answers0