The above code calculates the cohort-wise retention rates. The cohort is the month of joining. So, the code caluclates the number customers who joined in the month of May 2015, how many are active month on month. The final output is stored in data frame df1 (shown below)
I need help creating dynamic column names which are currently hard coded in the ddply function. M0 which means month of joining, M1 means 1st month from joining, M2 means 2 month from joining to M(n) should be variables. This can calculated by subtracting the farthest expiry date from the earliest joining date.
Unfortunately, I'm not able to auto calculate M0 to M(n) range dynamically.
Here is my code dump which works but is not optimal because I have hardcoded M0 to M3 as a variable in ddply function. So if my input data has a customer with a longer subscription period than 5 months my code will fail.
The input to the code is the following dummy data.
customer dj exp
abc 01/05/15 25/06/15
efg 01/05/15 25/07/15
ghd 01/05/15 25/07/15
mkd 01/06/15 25/07/15
kskm 01/06/15 05/08/15
Reproducible code.
library(zoo)
library(plyr)
customer<-c("abc","efg","ghd","mkd","kskm")
dj<-c("2015-05-01", "2015-05-01", "2015-05-01","2015-06-01","2015-06-01")
exp<-c("2015-06-25", "2015-07-25", "2015-07-25","2015-07-01","2015-08-05")
data.frame(customer,dj,exp)
df$dj <- as.Date(df$dj,"%d/%m/%y")
df$exp <- as.Date(df$exp,"%d/%m/%y")
# The data in the file has different variable names than your example data
# so I'm changing them to match
names(df)[1:3] <- c("customer","dj","exp")
# Make a variable called Cohort that contains only the year and month of joining
# as.yearmon() comes from the 'zoo' package
df$Cohort <- as.yearmon(df$dj)
# Calculate the difference in months between date of expiry and date of joining
df$MonthDiff <- ceiling((df$exp-df$dj)/30)
#df$MonthDiff <- 12*(as.yearmon(df$exp+months(1))-df$Cohort)
range<-as.integer(ceiling((max(df$exp)-min(df$dj)))/30)
# Use ddply() from the 'plyr' package to get the frequency of subjects that are
# still active after 0, 1, 2, and 3 months.
df1 <- ddply(df,.(Cohort),summarize,
M0 = sum(MonthDiff > 0),
M1 = sum(MonthDiff > 1),
M2 = sum(MonthDiff > 2),
M3 = sum(MonthDiff > 3)
)
df1
df1
Cohort M0 M1 M2 M3
1 May 2015 3 3 2 0
2 Jun 2015 2 2 1 0
The above is the output working output. Ask is to make column M0 to M3 dynamic