4

Vertica has a very nice type of operations: Event-Based Window operations, which basically let you identify when an event occurs. For example the conditional_true_event will increment a counter each time the given boolean expression resolves to true. We use this kind of approach heavily.

We are thinking about moving to RedShift, but we would need a similar function. RedShift has some nice window functions, but I can't find this one.

Is there any way I can emulate this function using RedShift?

realr
  • 3,652
  • 6
  • 23
  • 34
JSBach
  • 4,679
  • 8
  • 51
  • 98

1 Answers1

7

The CONDITIONAL_TRUE_EVENT() is rather easy to write with window functions. It's just a COUNT with a conditional (CASE):

SELECT ts, symbol, bid,
       CONDITIONAL_TRUE_EVENT(bid > 10.6)  
           OVER (ORDER BY ts) AS oce
FROM Tickstore3 
ORDER BY ts ;

becomes:

SELECT ts, symbol, bid,
       COUNT(CASE WHEN bid > 10.6 THEN 1 END)  
           OVER (ORDER BY ts
                 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) 
          AS oce
FROM Tickstore3 
ORDER BY ts ;

The CONDITIONAL_CHANGE_EVENT() is more complicated because it needs to use the previous value. It can be emulated using LAG() and SUM() or COUNT() (or ROW_NUMBER()). But it will require I think a CTE or a derived table (or a self-join):

SELECT ts, symbol, bid,
       CONDITIONAL_CHANGE_EVENT(bid)  
           OVER (ORDER BY ts) AS cce
FROM Tickstore3 
ORDER BY ts ;

will become:

WITH emu AS
  ( SELECT ts, symbol, bid,
           CASE WHEN bid <> LAG(bid) OVER (ORDER BY ts) 
               THEN 1 
           END AS change_bid
    FROM Tickstore3
  ) 
SELECT ts, symbol, bid,
       COUNT(change_bid) 
           OVER (ORDER BY ts
                 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) 
         AS cce
FROM emu 
ORDER BY ts ;

I don't know how this CONDITIONAL_CHANGE_EVENT() function behaves with nulls. If there are NULL values in the checked for changes column - and you want to see if there is a change from the last non-null value and not just the previous one - the rewrite will be even more complicated.


Edit: As far as I understand Redshift's documentation an explicit window frame (ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) is required for window aggregates when there is an ORDER BY. So, you can/have to use that (or whatever the default frame is in Vertica for these cases. It's either the above or with RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW).

ypercubeᵀᴹ
  • 113,259
  • 19
  • 174
  • 235