I have data on dates of visits and personal ids:
n <- 1e6
set.seed(42L)
DT <- data.table(id = sample(1:37000, n, replace=TRUE),
date = as.Date("1963-07-13", "%Y-%m-%d")
- sample(1:9000, n, replace=TRUE))
I'm adding a variable that ranks the visits for each person. Visit #1, #2, etc. If I can't differentiate between two visits they can be ranked in any (consistent) way.
After my last question (on efficiency) I realised I should learn how to use data.table
. So my current solution is with data.table -- the only problem is the command takes few seconds to run.
> system.time(DT[, visit.n := rank(date, ties.method="first"), by = id]
+ )
user system elapsed
4.42 0.02 4.44
I wonder if I'm doing something "wrong" or just need to be patient and move on.