I'm trying to write a script where I split my data into rolling 12 month data frames and then I'll run regressions on each data frame. My dataset is around 5 million rows and has a sample period from Jan 2015 to Mar 2023, so I have 88 different data frames. I have the bulk of my script already written but where I'm struggling is how to write a function to split the datasets into 12 month rolling intervals. Right now, what I'm doing to split the data, is really inefficient but works, see below:
library(dplyr)
library(mondate)
x$Date<-as.yearmon(x$Date)
a<-x%>%filter(between(Date,"Jan 2015","Dec 2015"))%>%mutate(ID="a")
b<-x%>%filter(between(Date,"Feb 2015","Jan 2016"))%>%mutate(ID="b")
c<-x%>%filter(between(Date,"Mar 2015","Feb 2016"))%>%mutate(ID="c")
d<-x%>%filter(between(Date,"Apr 2015","Mar 2016"))%>%mutate(ID="d")
e<-x%>%filter(between(Date,"May 2015","Apr 2016"))%>%mutate(ID="e")
......
jjjj<-x%>%filter(between(Date,"Apr 2022","Mar 2023"))%>%mutate(ID="jjjj")
df<-list(a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z,aa,bb,cc,dd,ee,ff,gg,hh,ii,jj,kk,ll,mm,nn,oo,pp,qq,rr,ss,tt,uu,vv,ww,xx,yy,zz,aaa,bbb,ccc,ddd,eee,fff,ggg,hhh,iii,jjj,kkk,lll,mmm,nnn,ooo,ppp,qqq,rrr,sss,ttt,uuu,vvv,www,xxx,yyy,zzz,aaaa,bbbb,cccc,dddd,eeee,ffff,gggg,hhhh,iiii,jjjj)
What I'm doing in my code is individually splitting the data into 88 data frames, assigning a unique identifier and then listing all the data frames. I then use plyr::llply
function to run functions on the list.
I have no idea how to create a loop for rolling 12 months.