0

I'm trying to write a script where I split my data into rolling 12 month data frames and then I'll run regressions on each data frame. My dataset is around 5 million rows and has a sample period from Jan 2015 to Mar 2023, so I have 88 different data frames. I have the bulk of my script already written but where I'm struggling is how to write a function to split the datasets into 12 month rolling intervals. Right now, what I'm doing to split the data, is really inefficient but works, see below:

library(dplyr)
library(mondate)

x$Date<-as.yearmon(x$Date)

a<-x%>%filter(between(Date,"Jan 2015","Dec 2015"))%>%mutate(ID="a")
b<-x%>%filter(between(Date,"Feb 2015","Jan 2016"))%>%mutate(ID="b")
c<-x%>%filter(between(Date,"Mar 2015","Feb 2016"))%>%mutate(ID="c")
d<-x%>%filter(between(Date,"Apr 2015","Mar 2016"))%>%mutate(ID="d")
e<-x%>%filter(between(Date,"May 2015","Apr 2016"))%>%mutate(ID="e")
......
jjjj<-x%>%filter(between(Date,"Apr 2022","Mar 2023"))%>%mutate(ID="jjjj")

df<-list(a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z,aa,bb,cc,dd,ee,ff,gg,hh,ii,jj,kk,ll,mm,nn,oo,pp,qq,rr,ss,tt,uu,vv,ww,xx,yy,zz,aaa,bbb,ccc,ddd,eee,fff,ggg,hhh,iii,jjj,kkk,lll,mmm,nnn,ooo,ppp,qqq,rrr,sss,ttt,uuu,vvv,www,xxx,yyy,zzz,aaaa,bbbb,cccc,dddd,eeee,ffff,gggg,hhhh,iiii,jjjj)

What I'm doing in my code is individually splitting the data into 88 data frames, assigning a unique identifier and then listing all the data frames. I then use plyr::llply function to run functions on the list.

I have no idea how to create a loop for rolling 12 months.

jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
a_js12
  • 329
  • 1
  • 8
  • 1
    Don't split your data - you'll end up copying it a bunch unnecessarily. [The answers to this question](https://stackoverflow.com/q/23162937/903061) illustrate rolling regressions in R without copying your data, or it looks like the[rollRegres package](https://CRAN.R-project.org/package=rollRegres) implements efficient rolling regression which might work directly. – Gregor Thomas May 01 '23 at 15:28
  • Thanks for the link, the issue I have is that I'm running a decay (nonlinear) model which uses a self-starter function that I've created. Also I'm fitting the regression and plotting the results. – a_js12 May 01 '23 at 15:36
  • That rules out the `rollRegres` package, but not the `rollapply` illustrated in my first link. – Gregor Thomas May 01 '23 at 15:42
  • `between(Date,"Jan 2015","Dec 2015")` will not work as you hope: it is doing a text-based comparison, not a date/number-based comparison, which will put (for example) `"Jan 2015"` between `"Aug 2023"` and `"Jul 1999"` (an extreme example to be clear). If you want number-like comparisons, you need number-like fields, both your `Date` variable and the two lower/upper arguments to `between`. (That comparison is only concerned with the first few characters, in this case the years are never considered.) – r2evans May 01 '23 at 15:48
  • I see what you mean r2evans. But I checked a few random data frames, and they returned the between dates that were specified. – a_js12 May 01 '23 at 16:33

0 Answers0