I was trying to use lag and data table notation to lag the variables, so that it is fastest. This is how I was trying to do it,
head(DT)
setkey(DT,code,year)
The output is
code year pt_N_1y ws_country close is_msci 1: 130104 2003 0 ISRAEL 0 0 2: 130104 2004 0 ISRAEL 0 0 3: 130104 2005 0 ISRAEL 0 0 4: 130104 2006 0 ISRAEL 0 0 5: 130104 2007 0 ISRAEL 0 0 6: 130104 2008 0 ISRAEL 0 0
DT[,L1_is_msci:=.SD[lag(is_msci,1)],by=code]
This gives 50 warnings and gives all NA
's.
Isn't .SD
supposed to subset the data by "code" and apply the function lag(is_msci, 1)
.
I would ideally like a 1 line function to do the lags and would like to work with base
functions and data table notation as it is the most optimal while dealing with huge datasets without installing many packages. Is it possible?
What I want to achieve is
code year pt_N_1y ws_country close is_msci L1_is_msci 1: 130104 2003 0 ISRAEL 0 0 NA 2: 130104 2004 0 ISRAEL 0 0 0 3: 130104 2005 0 ISRAEL 0 0 0 4: 130104 2006 0 ISRAEL 0 0 0 5: 130104 2007 0 ISRAEL 0 0 0 6: 130104 2008 0 ISRAEL 0 0 0