1

I have a panel data set like the following:

          id  date1    Returns Mkt.RF SMB.y HML.y RMW.y CMA.y RF.y
1 LP60068503 200002  3.9487727   5.95  0.47 -9.00  3.25 -7.18 0.43
2 LP60068503 200003  4.6201232   0.66 -5.04  3.10 -2.74  4.82 0.47
3 LP60068503 200004 -1.2757605  -5.58 -4.37  5.67 -1.08  1.22 0.46
4 LP60068503 200005 -1.3916501  -0.08  0.18  6.05 -3.30  4.84 0.50
5 LP60068503 200006 -2.4193548   0.67  0.50  2.94 -3.15  1.10 0.40
6 LP60068503 200007  0.8264463  -1.58 -0.71  3.25 -0.19 -0.10 0.48

               id  date1    Returns Mkt.RF SMB.y HML.y RMW.y CMA.y RF.y
340373 LP65117791 201207  3.4376360   0.56 -1.38 -2.57  2.29 -2.04 0.00
340374 LP65117791 201208  0.7893412   4.51  0.06  3.38 -1.68  1.45 0.01
340375 LP65117791 201209  0.2556494   3.49  1.65  2.33 -1.52  0.69 0.01
340376 LP65117791 201210 -1.0310320   1.67 -0.61  2.06 -0.93 -0.15 0.01
340377 LP65117791 201211  0.3411351   2.28 -2.40 -0.55  0.62 -0.80 0.01
340378 LP65117791 201212  0.7903986   3.38  2.48  3.09 -0.53  0.89 0.01

I would like to calculate the following rolling regression with three independet variables and get the alpha (a) intercept in a new column.

Returns ~ a + ß1*Mkt.RF + ß3*SMB.y + ß3*HML.y + e

The width window should be 36 months.

Using the rollRegres function works with one independent variable but I would like to know how to adjust this package to three independent variables. The code for getting the alpha intercept in a new column with one dependent variable would look like this:

Returns ~ a + ß1*Mkt.RF + e

dt[, alpha:=
   roll_regres.fit(x = cbind(1, .SD[["Mkt.RF"]]), y = .SD[["return"]],
                   width = 36L)$coefs[, 1],
   by = id]

So I'd like to know how to adjust this to three or more independent variables.

user9259005
  • 465
  • 1
  • 4
  • 12

2 Answers2

3

It seems like you are using data.table so I will too. You can achive what you want as follows

#####
# simulate data 
n_gr   <- 100 
n_date <- 50
n <- n_gr * n_date
id <- gl(n_gr, n_date)
date. <- rep(1:n_date, n_gr)

set.seed(39820955)
X <- matrix(rnorm(n * 3), n, 
            dimnames = list(NULL, c("Mkt.RF", "SMB.y", "HML.y")))
library(data.table)  
dt <- data.table(id = id, date1 = date., Returns = rowSums(X) + rnorm(n), X)

#####
# estimate coefficients
setkey(dt, id, date1) # sort data

library(rollRegres)
func <- function(SD){
  x <- roll_regres.fit(
    x = cbind(1, SD$Mkt.RF, SD$SMB.y, SD$HML.y), y = SD$Returns, 
    width = 36L)$coefs
  split(x, rep(1:ncol(x), each = nrow(x))) # turn matrix into list of column vectors
}
dt[, c("alpha", "b.Mkt", "b.SMB", "b.HML") := func(.SD), by = id]

tail(dt)
#R     id date1  Returns  Mkt.RF  SMB.y  HML.y  alpha b.Mkt b.SMB b.HML
#R 1: 100    45  1.08926  1.0470  0.277 -0.179 -0.355 0.854  1.25  1.09
#R 2: 100    46 -0.09738 -0.0718 -0.190  0.860 -0.318 0.813  1.26  1.06
#R 3: 100    47  0.00525  1.3981  1.618 -1.335 -0.349 0.742  1.18  1.09
#R 4: 100    48  0.65891 -0.3901 -0.239  1.558 -0.266 0.732  1.02  1.11
#R 5: 100    49 -0.18841 -0.4336  0.266  0.657 -0.265 0.761  1.06  1.14
#R 6: 100    50 -1.55515 -0.6723 -1.567  1.014 -0.275 0.769  1.08  1.14

The columns alpha, b.Mkt, b.SMB, and b.HML are the slopes and intercepts.

Using the rollRegres function works with one independent variable but I would like to know how to adjust this package to three independent variables.

This is not true. The functions work for multiple regression. As of this writing, the only examples in both the vignette and the manual pages are with multiple regression... See e.g., help("roll_regres") or vignette("Comparisons", package = "rollRegres").

0

To additionally get the R² in a column the following code can be used:

func <- function(SD){
+                     x <- roll_regres.fit(
+                      x = cbind(1, SD$Mkt.RF, SD$SMB.y, SD$HML.y), y = SD$Returns, 
+         width = 36L, do_compute = c("sigmas", "r.squareds", "1_step_forecasts"))$r.squareds}
>  dt[, c("R2") := func(.SD), by = id]

I wonder if an adjusted R² would also be possible?

user9259005
  • 465
  • 1
  • 4
  • 12
  • There are 36 observations in each window (assuming that you have no missing values) so you [can just use the formulas from wiki](https://en.wikipedia.org/wiki/Coefficient_of_determination#Adjusted_R2). – Benjamin Christoffersen Sep 19 '18 at 07:36