Calculating rolling retention with Python

Question

I am having trouble calculating rolling retention.

I was trying to figure out how to make groupby work, but it seems like it suits only for calculating classic retention.

Rolling retention - cound amount of users from each group who logged in on the exact month OR later.

data = {'id':[1, 1, 1, 2, 2, 2, 2, 3, 3], 
        'group_month': ['2013-05', '2013-05', '2013-05', '2013-06', '2013-06', '2013-06', '2013-06', '2013-06', '2013-06'], 
        'login_month': ['2013-05', '2013-06', '2013-07', '2013-06', '2013-07', '2013-09', '2013-10', '2013-09', '2013-10']}

Transforming data:

data = pd.DataFrame(data)

pd.to_datetime(data['group_month'], format='%Y-%m', errors='coerce')

pd.to_datetime(data['login_month'], format='%Y-%m', errors='coerce')

To calculate classic retention (count users from each cohort who logged in on the exact month I used following code:

classic_ret = pd.DataFrame(data[(data['login_month'] >= data['group_month'])].groupby(['group_month', 'login_month'])['id'].count())

classic_ret.unstack()

Rolling retention should have the following output:

+-------------+---------+---------+---------+---------+---------+---------+
| group_month | 2013-05 | 2013-06 | 2013-07 | 2013-08 | 2013-09 | 2013-10 |
+-------------+---------+---------+---------+---------+---------+---------+
| 2013-05     |       1 |       1 |       1 |       1 |       1 |       1 |
| 2013-06     |       0 |       1 |       1 |       1 |       2 |       2 |
+-------------+---------+---------+---------+---------+---------+---------+

This might help calculate and visualize retention : [link](https://medium.com/@darshildesai/user-retention-in-python-8c33fa5766b6) — DrDEE, Nov 06 '19 at 18:41

moys · Accepted Answer · 2019-09-06T04:33:42.090

With cross tab, i could only manage the table below.

a = data.set_index('login_month').groupby('id').resample('M').last().ffill().drop('id', axis=1).reset_index()

pd.crosstab(a.group_month, a.login_month)

Output

login_month     2013-05-31  2013-06-30  2013-07-31  2013-08-31  2013-09-30  2013-10-31
group_month                         
2013-05-01  1   1   1   0   0   0
2013-06-01  0   1   1   1   2   2

However, we could get the values you need as below.


a = data.set_index('login_month').groupby('id').resample('M').last().ffill().drop('id', axis=1).reset_index()
pd.DataFrame(a[(a['login_month'] >= a['group_month'])].groupby(['group_month', 'login_month'])['id'].count()).unstack().fillna(method='ffill',axis=1).fillna(value=0)

output

login_month     2013-05-31  2013-06-30  2013-07-31  2013-08-31  2013-09-30  2013-10-31
group_month                         
2013-05-01  1.0     1.0     1.0     1.0     1.0     1.0
2013-06-01  0.0     1.0     1.0     1.0     2.0     2.0

Hello, thank you for your answer! But the result is the same as in classic retention. I will add some data to dataset, maybe my question is unclear — , Sep 06 '19 at 02:02

Calculating rolling retention with Python

1 Answers1