I am having a problem with column aliases on Amazon Redshift when using window functions, wherein if the window encounters a null row (even with ignore nulls) the column alias doesn't work.
Let's start with what does work:
with my_data as(
select 1 as rowno, 100 as value
union all select '2',200
union all select '3',300
union all select '4',400
)
select rowno, value,
lead(value,1) IGNORE NULLS over (order by rowno ASC) - value as row_delta,
row_delta as alias_value
from my_data
order by rowno;
This query produces a result which I would expect:
rowno | value | row_delta | alias_value
------+-------+-----------+------------
1 | 100 | 100 | 100 <- alias value is same as row_delta
2 | 200 | 100 | 100 <- yep
3 | 300 | 100 | 100 <- yep
4 | 400 | | <- expected because lead() has hit the end of the table
Now, let's add a null value in the middle of the data...
with my_data as(
select 1 as rowno, 100 as value
union all select '2',200
union all select '3',null
union all select '4',400
union all select '5',500
)
select rowno, value,
lead(value,1) IGNORE NULLS over (order by rowno ASC) - value as row_delta,
row_delta as alias_value
from my_data
order by rowno;
The result looks like this:
rowno | value | row_delta | alias_value
------+-------+-----------+------------
1 | 100 | 100 | 100 <- this looks ok
2 | 200 | 200 | <- alias value should also be 200, but it's null
3 | | | <- this is the null row that's causing the problem
4 | 400 | 100 | 100 <- now we're ok again
5 | 500 | | <- as expected because we stepped off the end of the table
As you can see, if the window function hits a null row, the alias just disappears.
Do I just misunderstand how column aliases work, or is this a bug in Redshift?
Why do I want to do this? Well, I want to filter my row deltas because occasionally my data has crazy things in it which you can't see until after the row comparison is done by the window function. However, you can't use window functions in where clauses or 'having' clauses, so I can't apply filters there. Creating a column alias for the window result, and then using case statements to carry a value forward or not allows me to apply data filters inline to the select clause...when it works.
For example (pseudo-query):
select lead(value) - value over ([partition order by]) as row_delta
(case when row_delta between [reasonable values] then row_delta else null end) as clean_delta
from my_data
This works fine until it hits a null - and then clean_delta ends up null, even though row delta successfully ignores nulls to get a value.
Any insights or workarounds appreciated.