0

I have the following:

with t as (
      SELECT advertisable, EXTRACT(YEAR from day) as yy, EXTRACT(MONTH from day) as mon, 
             ROUND(SUM(cost)/1e6) as val
      FROM adcube dac
      WHERE advertisable IN (SELECT advertisable
                                 FROM adcube dac 
                                 GROUP BY advertisable
                                 HAVING SUM(cost)/1e6 > 100
                                )
      GROUP BY advertisable, EXTRACT(YEAR from day), EXTRACT(MONTH from day)
     )
select advertisable, min(yy * 10000 + mon) as yyyymm
from (select t.*,
             (row_number() over (partition by advertisable order by yy, mon) -
              row_number() over (partition by advertisable, val order by yy, mon)
             ) as grp
      from t
     )as foo
group by advertisable, grp, val
having count(*) >= 6 and val = 0
;  

This tracks the activation date of an account that stops spend for 4 months. However I would like to track the reactivation date instead. So if an account starts spend again after 4 months I can see the new start date for that account?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
user3207341
  • 69
  • 1
  • 2
  • 9

2 Answers2

1

You want to find accounts where val > 0 and there are 4 (or 6) preceding records with 0s.

Here is an idea:

  • Calculate the groups of similar values as in your query.
  • Assign a sequential number to each group (val_seqnum).
  • Then pull the previous value and sequence number for each record.

Now, you want the records where the following is true:

  • val > 0
  • prev_val = 0
  • The previous val_seqnum >= 4 (or whatever your threshold).

The following query should do this (assuming the same definition of t):

select t.*
from (select t.* ,
             lag(val) over (partition by advertisable order by yy, mon) prev_val,
             lag(val_seqnum) over (partition by advertisable order by yy, mon) as prev_val_seqnum
      from (select t.*,
                   row_number() over (partition by advertisable, val, grp order by yy, mon) as val_seqnum
                  ) as grp
            from (select t.*,
                         (row_number() over (partition by advertisable order by yy, mon) -
                          row_number() over (partition by advertisable, val order by yy, mon)
                         ) as grp
                  from t
                 ) t
           ) t
     ) t
where val > 0 and prev_val = 0 and prev_val_seqnum >= 4;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
1

I think this can be radically simpler (and faster):

SELECT advertisable, ym AS reactivation_ym
FROM (
   SELECT advertisable
        , date_trunc('month', day) AS ym
        , SUM(cost) < 500000       AS asleep
        , count(SUM(cost) < 500000 OR NULL)
                OVER (PARTITION BY advertisable
                      ORDER BY date_trunc('month', day)
                      ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) AS ct
   FROM   adcube dac
   JOIN  (
      SELECT advertisable
      FROM   adcube
      GROUP  BY 1
      HAVING SUM(cost) > 1e8   -- really 10000000 ?
      ) x USING (advertisable)
   GROUP BY 1, 2
   ) sub
WHERE  NOT asleep
AND    ct = 4;

Building on a couple of assumptions to fill in for missing information.
I largely untangled your calculations and simplified the code making it shorter and faster than your original.

  • Count for each advertisable how many of the the last 4 months had a total cost below 500000. Only with all 4 (existing) months below the threshold, the row qualifies. (If you don't have rows for all months, you need to decide how to handle missing rows. Information is not available in your question.)

Using count() as window aggregate function with a custom frame. Here is a recent related answer with detailed explanation:

How can you "nest" count() and sum()?
They are not really nested. It's a window function over an aggregate function. Details:

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Thanks this is much faster alright, so when you say 'ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING' this is checking 4 months with no spend and then the 1 month after to see if still 0? – user3207341 Nov 21 '14 at 10:38
  • @user3207341: Actually, the check that the *current* row is spending again was missing. Amended. – Erwin Brandstetter Nov 21 '14 at 13:50