How can I imitate the LOCF behavior induced by lag(x) ignore nulls
on, e.g., Redshift, in Presto?
Take this sample data:
select * from (
values (7369, null), (7499, 300), (7521, 500),
(7566, null), (7654, 1400), (7698, null),
(7782, null), (7788, null), (7839, null),
(7844, 0), (7876, null), (7900, null),
(7902, null), (7934, null)
) ex(empno, comm)
-- empno comm
-- 7369
-- 7499 300
-- 7521 500
-- 7566
-- 7654 1400
-- 7698
-- 7782
-- 7788
-- 7839
-- 7844 0
-- 7876
-- 7900
-- 7902
-- 7934
Desired output is:
-- empno comm prev_comm
-- 7369
-- 7499 300
-- 7521 500 300
-- 7566 500
-- 7654 1400 500
-- 7698 1400
-- 7782 1400
-- 7788 1400
-- 7839 1400
-- 7844 0 1400
-- 7876 0
-- 7900 0
-- 7902 0
-- 7934 0
This can be nearly achieved by the following (adapted to Presto from here):
select empno, comm, max(comm) over (partition by grp) prev_comm
from (
select empno, comm, sum(cast(comm is not null as double)) over (order by empno) grp
from example_table
)
order by empno
-- empno comm prev_comm
-- 7369
-- 7499 300 300
-- 7521 500 500
-- 7566 500
-- 7654 1400 1400
-- 7698 1400
-- 7782 1400
-- 7788 1400
-- 7839 1400
-- 7844 0 0
-- 7876 0
-- 7900 0
-- 7902 0
-- 7934 0
(the difference being that the current rows for non-NULL
comm
are incorrect)
Actually, in my case, the difference doesn't matter, since I want to coalesce(comm, prev_comm)
. However, this answer still does not suffice, because in the full data set, it created a memory failure:
Query exceeded local memory limit of 20GB
The following outstanding pull request to presto
would implement ignore nulls
directly; is there no way to accomplish the equivalent result in the interim?