I am having trouble calculating rolling retention.
I was trying to figure out how to make groupby work, but it seems like it suits only for calculating classic retention.
Rolling retention - cound amount of users from each group who logged in on the exact month OR later.
data = {'id':[1, 1, 1, 2, 2, 2, 2, 3, 3],
'group_month': ['2013-05', '2013-05', '2013-05', '2013-06', '2013-06', '2013-06', '2013-06', '2013-06', '2013-06'],
'login_month': ['2013-05', '2013-06', '2013-07', '2013-06', '2013-07', '2013-09', '2013-10', '2013-09', '2013-10']}
Transforming data:
data = pd.DataFrame(data)
pd.to_datetime(data['group_month'], format='%Y-%m', errors='coerce')
pd.to_datetime(data['login_month'], format='%Y-%m', errors='coerce')
To calculate classic retention (count users from each cohort who logged in on the exact month I used following code:
classic_ret = pd.DataFrame(data[(data['login_month'] >= data['group_month'])].groupby(['group_month', 'login_month'])['id'].count())
classic_ret.unstack()
Rolling retention should have the following output:
+-------------+---------+---------+---------+---------+---------+---------+
| group_month | 2013-05 | 2013-06 | 2013-07 | 2013-08 | 2013-09 | 2013-10 |
+-------------+---------+---------+---------+---------+---------+---------+
| 2013-05 | 1 | 1 | 1 | 1 | 1 | 1 |
| 2013-06 | 0 | 1 | 1 | 1 | 2 | 2 |
+-------------+---------+---------+---------+---------+---------+---------+