I have two straightforward tables on Snowflake:
registration table record the month when the user registered the account, for example:
registered_month | user_id |
---|---|
2018-01-01 | 352 |
2018-01-01 | 421 |
2018-02-01 | 534 |
2018-03-01 | 678 |
...
activity table records the month when the users are active, for example:
active_month | user_id |
---|---|
2018-01-01 | 381 |
2019-03-01 | 427 |
2021-10-01 | 1085 |
2020-10-01 | 4973 |
...
A user registered in '2018-01-01' can be active in '2018-01-01' month, '2018-02-01' month, skip '2018-03-01' month(Not active this month), and come back in '2018-04-01' month. The user_id will be counted as remaining_size in '2018-01-01', '2018-02-01', and '2018-04-01'.
The result table I want is something like this, for example:
acctCreated | Evaluation_Month | months_passed | original_size | remaining | retention |
---|---|---|---|---|---|
2018-01-01 | 2018-01-01 | 0 | 500 | 500 | 1 |
2018-01-01 | 2018-02-01 | 1 | 500 | 480 | 0.96 |
2018-01-01 | 2018-03-01 | 2 | 500 | 455 | 0.91 |
2018-01-01 | 2018-04-01 | 3 | 500 | 425 | 0.85 |
2018-01-01 | 2018-05-01 | 4 | 500 | 400 | 0.8 |
...... Month over month until the current month, the 2021-09-01 month
I really want to provide my solution, but I am really a novice on this and felt a bit lost on solving this issue
With new_user as (
SELECT registered_month, user_id
FROM registration
ORDER BY 1, 2
),
active_user as (
SELECT active_month, user_id
FROM activity
ORDER BY 1, 2
)
SELECT a.active_month,
DATEDIFF(n.registered_month, a.active_month) as months_passed,
COUNT(a.user_id) as counter
FROM active_user a LEFT JOIN new_user n
ON a.user_id = n.user_id
AND a.active_month >= n.registered_month
WHERE n.user_id is NOT NULL
GROUP BY 1,2
ORDER BY 1,2,3
I am thinking about something like this. Could anyone please help me with this kind of cohort analysis/retention rate analysis issue with SQL? Thanks a lot!!!