Here is an example using some example data on a dataset that is at first 10 million rows, with 100 users, diff
ing 100,000 time points each, then 140 million rows, with 1,400 users so same number of timepoints. This transposes the time points to the columns. I should imagine if you were transposing users to columns it would be even faster. I used @Arun 's answer here as a template. Basically it shows that on a really big table you can do it on a single core (i7 2.6 GhZ) in < 90 seconds (and that is using code which is probably not fully optimsied):
require(data.table)
## Smaller sample dataset - 10 million row, 100 users, 100,000 time points each
DT <- data.table( Date = sample(100,1e7,repl=TRUE) , User = rep(1:100,each=1e5) )
## Size of table in memory
tables()
# NAME NROW MB COLS KEY
#[1,] DT 10,000,000 77 Date,User
#Total: 77MB
## Diff by user
dt.test <- quote({
DT2 <- DT[ , list(Diff=diff(c(0,Date))) , by=list(User) ]
DT2 <- DT2[, as.list(setattr(Diff, 'names', 1:length(Diff))) , by = list(User)]
})
## Benchmark it
require(microbenchmark)
microbenchmark( eval(dt.test) , times = 5L )
#Unit: seconds
# expr min lq median uq max neval
# eval(dt.test) 5.788364 5.825788 5.9295 5.942959 6.109157 5
## And with 140 million rows...
DT <- data.table( Date = sample(100,1.4e8,repl=TRUE) , User = rep(1:1400,each=1e5) )
#tables()
# NAME NROW MB
#[1,] DT 140,000,000 1069
microbenchmark( eval(dt.test) , times = 1L )
#Unit: seconds
# expr min lq median uq max neval
# eval(dt.test) 84.3689 84.3689 84.3689 84.3689 84.3689 1