0

I want to calculate betas and intercept regressing returns on index factor for every date in the dataset, grouping by company, and using the interval [-50,-10] prior to that date. I would also like to save the intercept and the beta in two new columns in the data frame.

   Company      Return       date indexfactor
1:    TSLA          NA 2020-01-02           1
2:    TSLA  0.02963319 2020-01-03           2
3:    TSLA  0.01925467 2020-01-06           3
4:    TSLA  0.03880052 2020-01-07           1
5:    TSLA  0.04920485 2020-01-08           2
6:    TSLA -0.02194501 2020-01-09           3

eg. the intercept and beta for TSLA on 01.01.2021, I would like to regress the returns on the index factor in the interval -50 and -10 observations back in time and store as the beta and intercept for 01.01.2021.

I have tried this code:

stocks[,as.list(coef(lm(Returns ~ indexfactor))), by=Company] but it does not take into account the relevant interval for calculations.

I would be very greatful if someone could help me!

require(quantmod)
require(zoo)
require(data.table)

# STOCK DATA.FRAME
TSLA <- as.data.frame(getSymbols.yahoo("TSLA", from="2020-01-01", verbose=F, auto.assign=F))
AAPL <- as.data.frame(getSymbols.yahoo("AAPL", from="2020-01-01", verbose=F, auto.assign=F))
MSFT <- as.data.frame(getSymbols.yahoo("MSFT", from="2020-01-01", verbose=F, auto.assign=F))


TSLA$Company <- c("TSLA")
AAPL$Company <- c("AAPL")
MSFT$Company <- c("MSFT")


TSLA$Return <- Delt(TSLA$TSLA.Adjusted)
AAPL$Return <- Delt(AAPL$AAPL.Adjusted)
MSFT$Return <- Delt(MSFT$MSFT.Adjusted)

colnames(TSLA) <- c("Open", "High", "Low", "Close", "Volume", "Adjusted", "Company","Return")
colnames(AAPL) <- c("Open", "High", "Low", "Close", "Volume", "Adjusted", "Company","Return")
colnames(MSFT) <- c("Open", "High", "Low", "Close", "Volume", "Adjusted", "Company", "Return")

stocks <- rbind(TSLA, AAPL, MSFT)
rm(AAPL,MSFT,TSLA)


stocks$date <- as.Date(rownames(stocks), format = "%Y-%m-%d")


stocks$Open <- NULL
stocks$Close <- NULL
stocks$High <- NULL
stocks$Low <- NULL
stocks$Volume <- NULL
stocks$Adjusted <- NULL

stocks$indexfactor <- 1:length(stocks)

stocks <- data.table(stocks)
  • 1
    Just about all the code here is for creating the data table, right? If so, it would be much better to just post a sample of data that reproduces the issue, rather than us needing the requisite packages to create your data – camille Oct 21 '21 at 13:47
  • This is a sample of the data :) I could have made it shorter, but for the sake of running regression i though it would be wise to add such amounts of data – thomas.diridondo Oct 21 '21 at 14:23
  • Right, but rather than us needing to run all the code to make the sample of data, you could just post the output of calling `dput` on a useable subset of it so we can start from there and get right into debugging what the problem is – camille Oct 21 '21 at 16:22
  • Do you have a link/page where I can read up on how to do that? – thomas.diridondo Oct 22 '21 at 06:24
  • The post on [reproducible examples](https://stackoverflow.com/q/5963269/5325862), which is also linked to on the [tag:r] tag – camille Oct 22 '21 at 15:06
  • Also just `?dput` – camille Oct 22 '21 at 15:15

1 Answers1

1

Assuming the library statements and input m shown reproducibly in the Note at the end create a function, cofun (coefficient function), that accepts a zoo object and regresses Return against the lag of indexFactor. dyn$lm is used to allow lm to work properly with the zoo object and lag. stats::lag is used to ensure that base lag (which will dispatch lag.zoo) is used just in case dplyr is loaded as that package clobbers lag. Now use that with rollapplyr to apply it to each set of 60 (=50+10) dates, for each symbol adding an intercept and beta column to m.

library(dyn)

cofun <- function(x) coef(dyn$lm(Return ~ stats::lag(indexFactor, -10), x))
m[, c("Intercept", "beta") := 
      cbind(Return, indexFactor) |>
      zoo() |>
      rollapplyr(60, cofun, by.column = FALSE, fill = NA, coredata = FALSE) |>
      as.data.table(),
    by = Symbol]

Note

library(data.table)
library(quantmod)

f <- function(x) {
  cbind(Return = Delt(Ad(x)), indexFactor = rep(1:3, length = nrow(x))) |>
    fortify.zoo()
}
sym <- c("TSLA", "AAPL", "MSFT")
getSymbols(sym, from="2020-01-01", to = "2021-10-21", env = e <- new.env())
m <- rbindlist(eapply(e, f), use.names = FALSE, id = "Symbol")
names(m) <- c("Symbol", "Date", "Return", "indexFactor")
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • Thank you! But I dont think Ive been precise enough in my explanation of what I what to conduct, Ill give an example to illustrate: TSLA 30.12.2021 --> regress returns on index factor between 10.11.2021-20.12.2021 --> intercept and coefficient will count for the date 30.12.2021 – thomas.diridondo Oct 25 '21 at 07:26