0

I have a large dataset which consists of USER_ID and Date. I've worked out how often each user logs in and found that the number of people who log in once or twice is much larger than the number of people who log in regularly. I presume for the purposes of my site that this is because of trial users never becoming full users. I would like to be able to find the min and max dates dates for each user and use these to calculate the duration of the users subscription. I can separate out users who only lasted under 30 days from people who lasted longer

library(lubridate )
library(dplyr)
df = data.frame(dataset)
sdf <- df
df$StartDate <- min(dmy(df$Date)[df$USER_ID == sdf$USER_ID])
range(df$StartDate)
df$EndDate <- max(dmy(df$Date)[df$USER_ID == sdf$USER_ID])
range(df$EndDate)
#df$Span <- as.period(as.Date(df$EndDate) - as.Date(df$StartDate), units = "day")
 df$Span <- as.Date(df$EndDate) %--% as.Date(df$StartDate)
 range(df$Span)

I cant figure out how to tell R to look at each member of the vector against the entire vector, which is why I tried to compare it against a copy of itself...

Can anyone point me in the right direction?

Jaap
  • 81,064
  • 34
  • 182
  • 193
tsuimark
  • 53
  • 4

0 Answers0