2

I am having a problem with column aliases on Amazon Redshift when using window functions, wherein if the window encounters a null row (even with ignore nulls) the column alias doesn't work.

Let's start with what does work:

with my_data as(
  select 1 as rowno, 100 as value
  union all select '2',200
  union all select '3',300
  union all select '4',400  
 )
 select rowno, value,
  lead(value,1) IGNORE NULLS over (order by rowno ASC) - value as row_delta,
  row_delta as alias_value
 from my_data
  order by rowno;

This query produces a result which I would expect:

rowno | value | row_delta | alias_value
------+-------+-----------+------------
    1 |   100 |       100 |         100 <- alias value is same as row_delta
    2 |   200 |       100 |         100 <- yep
    3 |   300 |       100 |         100 <- yep
    4 |   400 |           |             <- expected because lead() has hit the end of the table

Now, let's add a null value in the middle of the data...

with my_data as(
  select 1 as rowno, 100 as value
  union all select '2',200
  union all select '3',null
  union all select '4',400
  union all select '5',500  
 )
 select rowno, value,
   lead(value,1) IGNORE NULLS over (order by rowno ASC) - value as row_delta,
   row_delta as alias_value
 from my_data
  order by rowno;

The result looks like this:

rowno | value | row_delta | alias_value
------+-------+-----------+------------
    1 |   100 |       100 |         100  <- this looks ok
    2 |   200 |       200 |              <- alias value should also be 200, but it's null
    3 |       |           |              <- this is the null row that's causing the problem
    4 |   400 |       100 |         100  <- now we're ok again
    5 |   500 |           |              <- as expected because we stepped off the end of the table

As you can see, if the window function hits a null row, the alias just disappears.

Do I just misunderstand how column aliases work, or is this a bug in Redshift?

Why do I want to do this? Well, I want to filter my row deltas because occasionally my data has crazy things in it which you can't see until after the row comparison is done by the window function. However, you can't use window functions in where clauses or 'having' clauses, so I can't apply filters there. Creating a column alias for the window result, and then using case statements to carry a value forward or not allows me to apply data filters inline to the select clause...when it works.

For example (pseudo-query):

select lead(value) - value over ([partition order by]) as row_delta
  (case when row_delta between [reasonable values] then row_delta else null end) as clean_delta
from my_data

This works fine until it hits a null - and then clean_delta ends up null, even though row delta successfully ignores nulls to get a value.

Any insights or workarounds appreciated.

Chris
  • 21
  • 2
  • 1
    Does redshift violate all-at-once rule: https://stackoverflow.com/a/33521767/5070879? – Lukasz Szozda Dec 12 '19 at 16:27
  • It indeed sounds very strange. A possible workaround could be to use w `WITH` syntax, and use a window function in a first cte, and use those values in expressions in the following cte where they will be just normal columns and not a _lateral column alias references_. – botchniaque Dec 12 '19 at 23:31
  • @LukaszSzozda - thanks for that, was not aware. However, my query works when there are no nulls present in the data, so if all-at-once were the cause, I'd think neither use case should work (windows and column aliases, with or without nulls present). – Chris Dec 13 '19 at 00:49
  • Amazon tech support has confirmed this is a bug in RedShift. They have logged it with the tech team but cannot say when it may be addressed. I solved my problem a different way by applying my noise-filter at a later step of my data processing. – Chris Dec 16 '19 at 15:08

0 Answers0