I think I understand your problem, so let me restate it so you can tell me if I am wrong.
You have a data frame where columns represent users, and rows represent days. Taking a column out of your data frame with df[[i]]
will therefore give you a time series for one user's activity.
The users didn't all start on the same day, so some of these time series may have a long initial run of 0
activity. This indicates that the user was not yet with your service, and should be NA
instead of 0. We can therefore assume everything prior to the date of the first non-zero number should be NA
.
Some users have 0 activity on some days after joining your service. This just means they aren't using your service on that day. However, if they leave your service altogether, they will generate a long run of zeros up to the end of their column from the point at which they left.
Some users might have a few 0s at the end of the data frame by chance - they have not left the service, but just happen not to have used it for a few days at the time point when the data frame stops. These 0s should not be converted to NA
values. However, if the user has more than 100 consecutive days of zero activity ongoing by the end of their column, all the zeros at the end should be converted to NA
.
Assuming this is what you mean, and assuming there are no NA
values to start with in your columns, we can solve the problem with run length encoding. I have commented each line so you can follow the logic:
for(i in length(df))
{
user <- df[[i]] # Write the column to a new vector for clarity
MAX <- 100 # Set the maximum number of 0s allowed at the end
user_rle <- rle(user) # Get run length encoding of the column
lens <- user_rle$lengths # Extract the run-length encoding lengths
vals <- user_rle$values # Extract the run-length encoding values
last <- length(lens) # For clarity of code, make alias for last index of rle
if(vals[1] == 0) { # If zeros at the start...
user[seq(lens[1])] <- NA # Replace with NA
}
if(vals[last] == 0 & lens[last] > MAX) { # If more than 100 0s at end
user[(-lens[last] + 1):0 + length(user)] <- NA # Replace with NA
}
df[[i]] <- user # Write the vector back in to the data frame
}
Note that there are more efficient ways to do this using less code, but this is intended to be easy to follow.