0

I am trying to calculate the churn rate from a data that has customer_id, group, date. The aggregation is going to be by id, group and date. The churn formula is (customers in previous cohort - customers in last cohort)/customers in previous cohort

customers in previous cohort refers to cohorts in before 28 days customers in last cohort refers to cohorts in last 28 days

I am not sure how to aggregate them by date range to calculate the churn.

Here is sample data that I copied from SQL Group by Date Range:

Date        Group    Customer_id
2014-03-01  A         1
2014-04-02  A         2
2014-04-03  A         3
2014-05-04  A         3
2014-05-05  A         6
2015-08-06  A         1
2015-08-07  A         2
2014-08-29  XXXX      2
2014-08-09  XXXX      3
2014-08-10  BB        4
2014-08-11  CCC       3
2015-08-12  CCC       2
2015-03-13  CCC       3
2014-04-14  CCC       5
2014-04-19  CCC       4
2014-08-16  CCC       5
2014-08-17  CCC       3
2014-08-18  XXXX      2
2015-01-10  XXXX      3
2015-01-20  XXXX      4
2014-08-21  XXXX      5
2014-08-22  XXXX      2
2014-01-23  XXXX      3
2014-08-24  XXXX      2
2014-02-25  XXXX      3
2014-08-26  XXXX      2
2014-06-27  XXXX      4
2014-08-28  XXXX      1
2014-08-29  XXXX      1
2015-08-30  XXXX      2
2015-09-31  XXXX      3

The goal is to calculate the churn rate every 28 days in between 2014 and 2015 by the formula given above. So, it is going to be aggregating the data by rolling it by 28 days and calculating the churn by the formula.

Here is what I tried to aggregate the data by date range:

SELECT COUNT(distinct customer_id) AS count_ids, Group,
        DATE_SUB(CAST(Date AS DATE), INTERVAL 56 DAY) AS Date_min,
        DATE_SUB(CURRENT_DATE, INTERVAL 28 DAY) AS Date_max
 FROM churn_agg
 GROUP BY count_ids, Group, Date_min, Date_max

Hope someone will help me with aggregation and churn calculation. I want to simply deduct the aggregated count_ids to deduct it from the next aggregated count_ids which is after 28 days. So this is going to be successive deduction of the same column value (count_ids). I am not sure if I have to use rolling window or simple aggregation to find the churn.

G1124E
  • 407
  • 1
  • 10
  • 20
  • You are also working in postgresql ? – VBoka Feb 10 '20 at 17:20
  • Here is a demo: https://dbfiddle.uk/?rdbms=postgres_12&fiddle=bae171f85ecffda04e610f4744b5290f maybe it will help to someone , I do not understand what OP wants... – VBoka Feb 10 '20 at 17:33
  • 1
    Sample data is great, but the (matching) expected result is also needed. – jarlh Feb 10 '20 at 17:39
  • You need a help table (or cte) with start and end dates for _every 28 days in between 2014 and 2015_. Outer join. – jarlh Feb 10 '20 at 17:48
  • @jarlh, thank you for your comment. I can create cte table, however, I want help with the logic on aggregating it by 28 days. – G1124E Feb 10 '20 at 17:53
  • group by startdate,enddate... – jarlh Feb 10 '20 at 17:59
  • @jarlh I have update the script and the last paragraph. Hope this helps explaining what I am looking for. How do I simply calculate the 28-day difference of the aggregated count_ids. – G1124E Feb 10 '20 at 18:40

1 Answers1

0

As corrected by @jarlh, it's not 2015-09-31 but 2015-09-30 You can use this to create 28 days calendar:

create table daysby28  (i int, _Date date);
insert into daysby28  (i, _Date)
SELECT i, cast('01-01-2014'as date) + i*INTERVAL '28 day'
from generate_series(0,50) i
order by 1;

After you use @jarlh churn_agg table creation he sent with the fiddle, with this query, you get what you want:

with cte as
(
  select count(Customer) as TotalCustomer, Cohort, CohortDateStart From
  (  
    select distinct a.Customer_id as Customer, b.i as Cohort, b._Date as CohortDateStart
    from churn_agg a left join daysby28 b on a._Date >=  b._Date and a._Date <  b._Date + INTERVAL '28 day' 
  ) a
  group by Cohort, CohortDateStart
)
select a.CohortDateStart, 
1.0*(b.TotalCustomer - a.TotalCustomer)/(1.0*b.TotalCustomer) as Churn from cte a
left join cte b on a.cohort > b.cohort
and not exists(select 1 from cte c where c.cohort > b.cohort and c.cohort < a.cohort)
order by 1  

The fiddle of all together is here

zip
  • 3,938
  • 2
  • 11
  • 19