1

I am trying to build a rolling regression function based on the example here, but in addition to returning the predicted values, I would like to return the some rolling model diagnostics (i.e. coefficients, t-values, and mabye R^2). I would like the results to be returned in discrete objects based on the type of results. The example provided in the link above sucessfully creates thr rolling predictions, but I need some assistance packaging and writing out the rolling model diagnostics:

In the end, I would like the function to return three (3) objects:

  1. Predictions
  2. Coefficients
  3. T values
  4. R^2

Below is the code:

require(zoo)
require(dynlm)

## Create Some Dummy Data
set.seed(12345)
x <- rnorm(mean=3,sd=2,100)
y <- rep(NA,100)
y[1] <- x[1]
for(i in 2:100) y[i]=1+x[i-1]+0.5*y[i-1]+rnorm(1,0,0.5)
int <- 1:100
dummydata <- data.frame(int=int,x=x,y=y)
zoodata <- as.zoo(dummydata)


rolling.regression <- function(series) {
  mod <- dynlm(formula = y ~ L(y) + L(x), data = as.zoo(series)) # get model

  nextOb <- max(series[,'int'])+1 # To get the first row that follows the window
  if (nextOb<=nrow(zoodata)) {   # You won't predict the last one

    # 1) Make Predictions
    predicted <- predict(mod,newdata=data.frame(x=zoodata[nextOb,'x'],y=zoodata[nextOb,'y']))
    attributes(predicted) <- NULL
    c(predicted=predicted,square.res <-(predicted-zoodata[nextOb,'y'])^2)

    # 2) Extract coefficients
    #coefficients <- coef(mod)

    # 3) Extract rolling coefficient t values
    #tvalues <- ????(mod)

    # 4) Extract rolling R^2
    #rsq <-


  }
}    

rolling.window <- 20
results.z <-  rollapply(zoodata, width=rolling.window, FUN=rolling.regression, by.column=F, align='right')

So after figuring out how to extract t values from model (i.e. mod) , what do I need to do to make the function return three (3) seperate objects (i.e. Predictions, Coefficients, and T-values)?

I am fairly new to R, really new to functions, and extreemly new to zoo, and I'm stuck.

Any assistance would be greatly appreciated.

Community
  • 1
  • 1
MikeTP
  • 7,716
  • 16
  • 44
  • 57

1 Answers1

2

I hope I got you correctly, but here is a small edit of your function:

rolling.regression <- function(series) {
  mod <- dynlm(formula = y ~ L(y) + L(x), data = as.zoo(series)) # get model

  nextOb <- max(series[,'int'])+1 # To get the first row that follows the window
  if (nextOb<=nrow(zoodata)) {   # You won't predict the last one
    # 1) Make Predictions
    predicted=predict(mod,newdata=data.frame(x=zoodata[nextOb,'x'],y=zoodata[nextOb,'y']))
    attributes(predicted)<-NULL
    #Solution 1; Quicker to write
    #     c(predicted=predicted, 
    #       square.res=(predicted-zoodata[nextOb,'y'])^2,
    #       summary(mod)$coef[, 1],
    #       summary(mod)$coef[, 3],
    #       AdjR = summary(mod)$adj.r.squared)

    #Solution 2; Get column names right
    c(predicted=predicted, 
      square.res=(predicted-zoodata[nextOb,'y'])^2,
      coef_intercept = summary(mod)$coef[1, 1],
      coef_Ly = summary(mod)$coef[2, 1],
      coef_Lx = summary(mod)$coef[3, 1],
      tValue_intercept = summary(mod)$coef[1, 3],
      tValue_Ly = summary(mod)$coef[2, 3],
      tValue_Lx = summary(mod)$coef[3, 3],
      AdjR = summary(mod)$adj.r.squared)
  }
}



rolling.window <- 20
results.z <-  rollapply(zoodata, width=rolling.window, FUN=rolling.regression, by.column=F, align='right')

    head(results.z)
   predicted square.res coef_intercept   coef_Ly  coef_Lx tValue_intercept tValue_Ly tValue_Lx      AdjR
20 10.849344   0.721452     0.26596465 0.5798046 1.049594       0.38309211  7.977627  13.59831 0.9140886
21 12.978791   2.713053     0.26262820 0.5796883 1.039882       0.37741499  7.993014  13.80632 0.9190757
22  9.814676  11.719999     0.08050796 0.5964808 1.073941       0.12523824  8.888657  15.01353 0.9340732
23  5.616781  15.013297     0.05084124 0.5984748 1.077133       0.08964998  9.881614  16.48967 0.9509550
24  3.763645   6.976454     0.26466039 0.5788949 1.068493       0.51810115 11.558724  17.22875 0.9542983
25  9.433157  31.772658     0.38577698 0.5812665 1.034862       0.70969330 10.728395  16.88175 0.9511061

To see how it works, make a small example with a regression:

x <- rnorm(1000); y <- 2*x + rnorm(1000)
reg <- lm(y ~ x)
summary(reg)$coef
              Estimate Std. Error    t value Pr(>|t|)
(Intercept) 0.02694322 0.03035502  0.8876033 0.374968
x           1.97572544 0.03177346 62.1816310 0.000000

As you can see, calling summary first and then getting the coefficients of it (coef(summary(reg)) works as well) gives you a table with estimates, standard errors, and t-values. So estimates are saved in column 1 of that table, t-values in column 3. And that's how I obtain them in the updated rolling.regression function.

EDIT

I updated my solution; now it also contains the adjusted R2. If you just want the normal R2, get rid of the .adj.

EDIT 2

Quick and dirty hack how to name the columns:

rolling.regression <- function(series) {
  mod <- dynlm(formula = y ~ L(y) + L(x), data = as.zoo(series)) # get model

  nextOb <- max(series[,'int'])+1 # To get the first row that follows the window
  if (nextOb<=nrow(zoodata)) {   # You won't predict the last one
    # 1) Make Predictions
    predicted=predict(mod,newdata=data.frame(x=zoodata[nextOb,'x'],y=zoodata[nextOb,'y']))
    attributes(predicted)<-NULL
    #Get variable names
    strVar <- c("Intercept", paste0("L", 1:(nrow(summary(mod)$coef)-1)))
    vec <- c(predicted=predicted, 
             square.res=(predicted-zoodata[nextOb,'y'])^2,
             AdjR = summary(mod)$adj.r.squared,
             summary(mod)$coef[, 1],
             summary(mod)$coef[, 3])
    names(vec)[4:length(vec)] <- c(paste0("Coef_", strVar), paste0("tValue_", strVar))

    vec
  }
}
Christoph_J
  • 6,804
  • 8
  • 44
  • 58
  • Maybe instead you could return it like a list. Then you could just return `list(summary(mod)$coef,summary(mod)$adj.r.squared)`. You wouldn't have to fix the variable names then. – nograpes Feb 06 '13 at 18:33
  • Great solution, thanks for taking the time to help me out. I was really struggeling with how to extract the coefficients and tvalue and you showed me how. Your solution provides the desired results but as one z object. I was thinking it would be preferred to have multiple discrete objects returned depending on the type of results (ie actual-predicted-resid, coeffiecnts, t-values, r2 and AdjR2) but after doing some more resarch it seems that R funcitons can only export one object so I would have to somehow build a list and index in to the specific discrete object. – MikeTP Feb 06 '13 at 18:34
  • Anyone have any good ideas on how to set this up to dynamicaly name the coefficients and t-values so that if variables were added or removed from the model formula the names would adjust accordingly? – MikeTP Feb 06 '13 at 18:37
  • @nograpes I thought about using a list at the beginning as well, but note that not one regression is run here, but many via `rollapply`. And then you get problems when combining that again, as far as I see. – Christoph_J Feb 06 '13 at 19:14
  • @MikeTP As nograpes already wrote, normally a function can also return a list and in a list you can put as many different objects as you like. But here you are running the function many times and combine it and this gets a little bit tricky with lists. For your second question: Did you check out my commented code? If you have a varying number of variables, you might as well just return the column of coefficients, this returns a table with varying columns. Then you just have to rename your table dynamically by setting `names(results.Z)'. – Christoph_J Feb 06 '13 at 19:19
  • But your function isn't really dynamic anyway, you hardcoded the call to dynlm in your function. So you always need a `series` object with columns y, L(x) and L(y). This is another issue though. This [link](http://stackoverflow.com/questions/6968127/how-can-i-dynamically-regress-and-predict-multiple-items-with-r) could get you started. – Christoph_J Feb 06 '13 at 19:21
  • @Christoph_J, Thanks again for your help and sorry for not focusing eneough on your solution 1, which after some more review I relize that it doese in fact partially do some dynamic naming. Using that solution, I guess I need to figure out a way to prefix the names in summary(mod)$coef[, 1] with "coef_" and the names in summary(mod)$coef[, 3] with "tvalue_". ps Im slow to reply not because I am not studying your response but as a new user takes me some time to read uand understand what is going on. Thanks again. I will study the link you provide – MikeTP Feb 06 '13 at 20:10
  • @MikeTP No worries; I changed the function, now the columns are named dynamically within the function. This should be flexible enough, so even if you would have more independent variables, it should still work. But then, in your current set up, you can only have two independent variables because you hardcoded the `dynlm` call, so this is actually too much effort for this particular function. I just showed it so you have one way (I'm not saying it's the best) on how to do it. – Christoph_J Feb 06 '13 at 22:43
  • So my advice: If this is a one-time function, i.e. if you want to use that function for one particular problem, you are fine to go with the above solution. If you want a flexible set up so that you are able to call it with many different variables, you probably should do some rewritting of the function. – Christoph_J Feb 06 '13 at 22:45
  • Thanks...looking at it now. I noticed you used paste0, is that custom function or should it just be paste? – MikeTP Feb 07 '13 at 00:02
  • `paste0` was introduced with R 2.15.1 I guess, so you don't have to set `sep=""` anymore. So it's not customed, just new. You can also use `paste(..., sep="")`. – Christoph_J Feb 08 '13 at 09:05