2

I am trying to get a rolling prediction of a dynamic timeseries in R (and then work out squared errors of the forecast). I based a lot of this code on this StackOverflow question, but I am very new to R so I am struggling quite a bit. Any help would be much appreciated.

require(zoo)
require(dynlm)

set.seed(12345)
#create variables
x<-rnorm(mean=3,sd=2,100)
y<-rep(NA,100)
y[1]<-x[1]
for(i in 2:100) y[i]=1+x[i-1]+0.5*y[i-1]+rnorm(1,0,0.5)
int<-1:100
dummydata<-data.frame(int=int,x=x,y=y)

zoodata<-as.zoo(dummydata)

prediction<-function(series)
  {
  mod<-dynlm(formula = y ~ L(y) + L(x), data = series) #get model
   nextOb<-nrow(series)+1
   #make forecast
   predicted<-coef(mod)[1]+coef(mod)[2]*zoodata$y[nextOb-1]+coef(mod)[3]*zoodata$x[nextOb-1]

   #strip timeseries information
   attributes(predicted)<-NULL

   return(predicted)
  }                

rolling<-rollapply(zoodata,width=40,FUN=prediction,by.column=FALSE)

This returns:

20          21      .....      80
10.18676  10.18676          10.18676

Which has two problems I was not expecting:

  1. Runs from 20->80, not 40->100 as I would expect (as the width is 40)
  2. The forecasts it gives out are constant: 10.18676

What am I doing wrong? And is there an easier way to do the prediction than to write it all out? Thanks!

Community
  • 1
  • 1
MatthewK
  • 171
  • 2
  • 8

1 Answers1

2

The main problem with your function is the data argument to dynlm. If you look in ?dynlm you will see that the data argument must be a data.frame or a zoo object. Unfortunately, I just learned that rollapply splits your zoo objects into array objects. This means that dynlm, after noting that your data argument was not of the right form, searched for x and y in your global environment, which of course were defined at the top of your code. The solution is to convert series into a zoo object. There were a couple of other issues with your code, I post a corrected version here:

prediction<-function(series) {
   mod <- dynlm(formula = y ~ L(y) + L(x), data = as.zoo(series)) # get model
   # nextOb <- nrow(series)+1 # This will always be 21. I think you mean:
   nextOb <- max(series[,'int'])+1 # To get the first row that follows the window
   if (nextOb<=nrow(zoodata)) {   # You won't predict the last one
     # make forecast
     # predicted<-coef(mod)[1]+coef(mod)[2]*zoodata$y[nextOb-1]+coef(mod)[3]*zoodata$x[nextOb-1]
     # That would work, but there is a very nice function called predict
     predicted=predict(mod,newdata=data.frame(x=zoodata[nextOb,'x'],y=zoodata[nextOb,'y']))
     # I'm not sure why you used nextOb-1  
     attributes(predicted)<-NULL
     # I added the square error as well as the prediction.
     c(predicted=predicted,square.res=(predicted-zoodata[nextOb,'y'])^2)
   }
}    

rollapply(zoodata,width=20,FUN=prediction,by.column=F,align='right')

Your second question, about the numbering of your results, can be controlled by the align argument is rollapply. left would give you 1..60, center (the default) would give you 20..80 and right gets you 40..100.

nograpes
  • 18,623
  • 1
  • 44
  • 67