I am attempting to run a rolling regression within a data.table. There are a number of questions that get at what I am trying to do, but they are generally 3+ years old and offer inelegant answers. (see: here, for example)
I am wondering if there has been any update to the data.table package that make this more intuitive/ faster?
Here is what I am trying to do. My code looks like this:
DT<-data.table(
Date = seq(as.Date("2000/1/1"), by = "day", length.out = 1000),
x1=rnorm(1000),
x2=rnorm(1000),
x3=rnorm(1000),
y=rnorm(1000),
country=rep(c("a","b","c","d"), each=25))
I would like to regress y on x1, x2 and x3, over a rolling 180 day window, by country, and store the coefficients by date.
Ideally the syntax would look something like this:
DT[,.(coef.x1 := coef(y~x1+x2+x3)[2] ,
coef.x2 := coef(y~x1+x2+x3)[3],
coef(y~x1+x2+x3)[4],
by=c("country",ROLLING WINDOW)]
... but even more elegant/ avoiding the repetition if possible! :)
I have yet to get the rollapply syntax to work well for me for some reason.
Thank you!
EDIT:
Thank you @michaelchirico.
Your suggestion comes close to what I'm aiming for - and maybe its possible to modify the code to receive it but again, I am stuck.
Here is a more careful articulation of what I need. Some code:
DT<-data.table(
Date = rep(seq(as.Date("2000/1/1"), by = "day", length.out = 10),times=3), #same dates per country
x1=rep(rnorm(10),time=3), #x1's repeat - same per country
x2=rep(rnorm(10), times=3),#x2's repeat - same per country
x3=rep(rnorm(10), times=3), #x3's repeat - same per country
y=rnorm(30), #y's do not repeat and are unique per country per day
country=rep(c("a","b","c"), each=10))
#to calculate the coefficients by individual country:
a<-subset(DT,country=="a")
b<-subset(DT,country=="b")
window<-5 #declare window
coefs.a<-coef(lm(y~x1+x2+x3, data=a[1:window]))#initialize my coef variable
coefs.b<-coef(lm(y~x1+x2+x3, data=b[1:window]))#initialize my coef variable
##calculate coefficients per window
for(i in 1:(length(a$Date)-window)){
coefs.a<-rbind(coefs.a, coef(lm(y~x1+x2+x3, data=a[(i+1):(i+window-1)])))
coefs.b<-rbind(coefs.b, coef(lm(y~x1+x2+x3, data=b[(i+1):(i+window-1)])))
}
The difference in this dataset versus the prior one is that the dates, and x1, x2, x3 all repeat. My y's are unique for each country.
In my actual data set I have 120 countries. I can calculate this for each country, but it is awfully slow and then I have to rejoin all of the coefficients into a single dataset for analysis of the results.
Is there a way similar to what you proposed to end up with a single data.table, with all observations?
Thanks once more!!