1

Currently using Postgres 9.5

I want to calculate monthly churn_count and churn_rate of the search function.

churn_count: number of users who used the search function last month but not this month
churn_rate: churn_count/total_users_last_month

My dummy data is:

CREATE TABLE yammer_events (
occurred_at TIMESTAMP,
user_id INT,
event_name VARCHAR(50)
);

INSERT INTO yammer_events (occurred_at, user_id, event_name) VALUES 
('2014-06-01 00:00:01', 1, 'search_autocomplete'),
('2014-06-01 00:00:01', 2,  'search_autocomplete'),
('2014-07-01 00:00:01', 1, 'search_run'),
('2014-07-01 00:00:02', 1, 'search_run'),
('2014-07-01 00:00:01', 2,  'search_run'),
('2014-07-01 00:00:01', 3,  'search_run'),
('2014-08-01 00:00:01', 1,  'search_run'),
('2014-08-01 00:00:01', 4,  'search_run');

Ideal output should be:

|month     |churn_count|churn_rate_percentage|
|---       |---        |---                  |
|2014-07-01|0          |0
|2014-08-01|2          |66.6                 |
  • In June: user 1, 2 (2 users)
  • In July: user 1, 2, 3 (3 users)
  • In August: user 1, 4 (2 users)
  • In July, we didn't lose any customer. In August, we lost customer 2 and 3, so the churn_count is 2, and the rate is 2/3*100 = 66.6

I tried the following query to calculate churn_count, but the result is really weird.

WITH monthly_activity AS (
  SELECT distinct DATE_TRUNC('month', occurred_at) AS month,
  user_id
FROM yammer_events
WHERE event_name LIKE 'search%'
)

SELECT last_month.month+INTERVAL '1 month', COUNT(DISTINCT last_month.user_id)
FROM monthly_activity last_month
LEFT JOIN monthly_activity this_month
ON last_month.user_id = this_month.user_id
AND this_month.month = last_month.month + INTERVAL '1 month'
AND this_month.user_id IS NULL
GROUP BY 1

db<>fiddle

Thank you in advance!

Jim Jones
  • 18,404
  • 3
  • 35
  • 44
union77
  • 103
  • 7
  • 1
    Your fiddle suggests you're using PostgreSQL 9.5, which is ancient and no longer supported. Consider an upgrade asap. – Jim Jones Jun 22 '22 at 07:34

2 Answers2

0

An easy way to do it would be to aggregate the users in an array, and from there extract and count the intersection between the current month and the previous one using the window function LAG(), e.g.

WITH j AS (
  SELECT date_trunc('month',occurred_at::date) AS month,
    array_agg(distinct user_id) AS users,
    count(distinct user_id) AS total_users
  FROM yammer_events
  GROUP BY 1
  ORDER BY 1
)
SELECT month::date, 
  cardinality(LAG(users) OVER w - users) AS churn_count,
  (cardinality(LAG(users) OVER w - users)::numeric / 
    LAG(total_users) OVER w::numeric) * 100 AS churn_rate_percentage
FROM j
WINDOW w AS (ORDER BY month
             ROWS BETWEEN 1 PRECEDING AND CURRENT ROW);

   month    | churn_count |  churn_rate_percentage  
------------+-------------+-------------------------
 2014-06-01 |             |                        
 2014-07-01 |           0 |  0.00000000000000000000
 2014-08-01 |           2 | 66.66666666666666666700
(3 rows)

Note: this query relies on the extension intarray. In case you don't have it in your system, just hit:

CREATE EXTENSION intarray;

Demo: db<>fiddle

Jim Jones
  • 18,404
  • 3
  • 35
  • 44
0
WITH monthly_activity AS (
  SELECT distinct DATE_TRUNC('month', occurred_at) AS month,
  user_id
FROM yammer_events
WHERE event_name LIKE 'search%'
)


SELECT
  last_month.month+INTERVAL '1 month', 
  SUM(CASE WHEN this_month.month IS NULL THEN 1 ELSE 0 END) AS churn_count,
  SUM(CASE WHEN this_month.month IS NULL THEN 1 ELSE 0 END)*1.00/COUNT(DISTINCT last_month.user_id)*100 AS churn_rate_percentage
FROM monthly_activity last_month
LEFT JOIN monthly_activity this_month
ON last_month.month + INTERVAL '1 month' = this_month.month
AND last_month.user_id = this_month.user_id
GROUP BY 1
ORDER BY 1
LIMIT 2

I think my way is more circuitous but easier for beginners to understand. Just for your reference.

union77
  • 103
  • 7
  • can you explain why are we added the interval in the outer query? last_month.month+INTERVAL '1 month' – m2rik Jul 19 '22 at 16:09
  • @m2rik because the month and the following month should be joined together to get to know the difference between these two months – union77 Jul 21 '22 at 05:09