I have a very large data set where I want to to take the lagged value of a row for the same by group. This functionality is well documented in the data.table package and I've seen other examples of it. However with my data I keep getting an error and I cannot figure out why.
According to documentation, this is the process. I can run this and it works.
library(data.table)
DT = data.table(year=rep(2010:2011, each=3), v1=1:6)
DT[, c("lag1", "lag2") := shift(.SD, 1:2), by=year]
Now here is a sample of my data.
unique_id click_time
100005361 2017-11-09 03:58:32
100005371 2017-11-09 00:53:08
100005371 2017-11-09 04:38:52
100005371 2017-11-09 05:42:30
100005371 2017-11-09 05:42:31
100005371 2017-11-09 05:42:31
100005371 2017-11-09 05:42:32
100005371 2017-11-09 05:42:38
100005371 2017-11-09 05:42:51
100005371 2017-11-09 09:46:57
100005371 2017-11-09 09:46:58
100005371 2017-11-09 09:46:58
100005371 2017-11-09 09:47:00
100005371 2017-11-09 11:37:26
100005391 2017-11-09 05:04:36
100005391 2017-11-09 05:04:41
10000541 2017-11-09 10:55:35
100005411 2017-11-08 22:55:49
100005411 2017-11-08 22:56:14
There are 5 unique ids here. I want to use the lag of click time and join it to each row where applicable. Obviously if there's just 1 row or it's the 1st row then there wont be any lag. Applying the above code should be a one liner. This what I tried.
DT[, c("last_click_time") := shift(.SD), by=unique_id]
This is following the syntax from the documentation and from links I've seen elsewhere such as here: How to create a lag variable within each group?
I get an error though that I can't figure out:
Error in `[.data.frame`(DT, , `:=`(c("last_click_time"), shift(.SD)), :
unused argument (by = unique_id)
Even if I replace the timestamps to something simpler, the error persists.
DT$click_time = seq(1,2000,length.out = 19)
Any idea why this error is popping up? This is driving me mad!
EDIT: The answer to this question is I am an idiot. If you try this on a data.frame() object you will get the error I showed. If you convert to a data.table() then the code works as it should. Embarrassing but hopefully this post will be useful to someone else later on.