1

I have two datasets "bear" and "frog" and I am trying to write a function, which would take columns one by one from "bear" as dependent variable in linear regression with same regressors in all cases, print the name of the column in each case and the summary of linear regression with coefficients, standard errors, t-value, R squared and residual standard errors.I want to get 25 separate outputs

I tried the following code

  print(lm(bear[,i]~frog$MK_RF+frog$SMB+frog$HML))
  print(colnames(bear[,i])) 
  summary (lm(bear[,i]~frog$Mkt.RF+frog$SMB+frog$HML))}

I wrote this function, but instead of column name i get NULL, and the summary only shows coefficients, no error message. For reproductible examples:

# dput(head(frog, 10))
frog <- structure(list(date = c(192607L, 192608L, 192609L, 192610L, 192611L, 
192612L, 192701L, 192702L, 192703L, 192704L), Mkt.RF = c(2.96, 
2.64, 0.36, -3.24, 2.53, 2.62, -0.06, 4.18, 0.13, 0.46), SMB = c(-2.3, 
-1.4, -1.32, 0.04, -0.2, -0.04, -0.56, -0.1, -1.6, 0.43), HML = c(-2.87, 
4.19, 0.01, 0.51, -0.35, -0.02, 4.83, 3.17, -2.67, 0.6), RF = c(0.22, 
0.25, 0.23, 0.32, 0.31, 0.28, 0.25, 0.26, 0.3, 0.25)), row.names = c(NA, 
10L), class = "data.frame")

and for the bear dataset

# dput(head(bear[, 1:3], 10)) 
bear <- structure(list(date = c(192607L, 192608L, 192609L, 192610L, 192611L, 
192612L, 192701L, 192702L, 192703L, 192704L), SMALL.LoBM = c(3.5582, 
-2.4574, -6.4413, -8.9441, 3.1644, 13.6658, 0.1974, 2.2284, 6.0998, 
5.5863), ME1.BM2 = c(-0.6319, -8.9775, -0.5289, -4.0732, 6.3376, 
-2.2572, -8.5499, -0.5649, -2.0464, 7.5611)), row.names = c(NA, 
10L), class = "data.frame")
AkselA
  • 8,153
  • 2
  • 21
  • 34
Julia
  • 241
  • 1
  • 8
  • 1
    this has got to be a duplicate ... https://stackoverflow.com/questions/46822631/r-how-can-i-use-the-apply-functions-instead-of-iterating ; https://stackoverflow.com/questions/54060985/function-which-runs-lm-over-different-variables; https://stackoverflow.com/questions/54907726/running-multiple-linear-regressions-across-several-columns-of-a-data-frame-in-r; https://stackoverflow.com/questions/45719732/r-automated-loop-of-linear-regressions-using-same-ivs-on-different-dvs-to-store ... – Ben Bolker May 31 '19 at 19:33
  • Can't you just do `lm(bear ~ frog$MK_RF+frog$SMB+frog$HML)`? I.e multivariate multiple regression. https://data.library.virginia.edu/getting-started-with-multivariate-multiple-regression/ – AkselA May 31 '19 at 19:37
  • No, i get error message "invalid type (list) for variable bear" – Julia May 31 '19 at 19:44
  • @Maria: Try converting it to matrix `lm(as.matrix(bear) ~ frog$MK_RF+frog$SMB+frog$HML)` – AkselA May 31 '19 at 19:50
  • @AkselA error message again "Error in model.frame.default(formula = as.matrix(bear) ~ frog$MK_RF + : invalid type (NULL) for variable 'frog$MK_RF'". – Julia May 31 '19 at 19:53
  • `dput(head(frog, 10)) structure(list(date = c(192607L, 192608L, 192609L, 192610L, 192611L, 192612L, 192701L, 192702L, 192703L, 192704L), Mkt.RF = c(2.96, 2.64, 0.36, -3.24, 2.53, 2.62, -0.06, 4.18, 0.13, 0.46), SMB = c(-2.3, -1.4, -1.32, 0.04, -0.2, -0.04, -0.56, -0.1, -1.6, 0.43), HML = c(-2.87, 4.19, 0.01, 0.51, -0.35, -0.02, 4.83, 3.17, -2.67, 0.6), RF = c(0.22, 0.25, 0.23, 0.32, 0.31, 0.28, 0.25, 0.26, 0.3, 0.25)), row.names = c(NA, 10L), class = "data.frame")` and for `dput(head(bear, 10))` the output is too long to be posted, it has 2575 characters.. – Julia May 31 '19 at 20:07
  • Good. Now we can both work with the same data. You may delete the unneeded comments. – AkselA May 31 '19 at 20:23

2 Answers2

1

I'd strongly recommend that you merge your data frames: relying on the row ordering being consistent is dangerous. The only reason not to is if your data sets are enormous and you can't afford the extra memory consumption.

bear_vars <- names(bear)[-1]
frog_vars <- names(frog)[-1]
bf <- merge(bear, frog, by = "date")

Now loop, using reformulate() to build a linear model formula with the values in frog_vars as the predictor (independent) variables and each value in bear_vars as the response (dependent) variable:

for (b in bear_vars) {
    m <- lm(reformulate(frog_vars, response=b), data=bf)
    cat(b,"\n")
    print(m)
    print(summary(m))
}

You can use tidyverse methods if you want, but this should work OK.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
1

The error occurred because you used the wrong name for one of the variables (there is no frog$MK_RF). The correct call would be

lm(as.matrix(bear) ~ frog$Mkt.RF+frog$SMB+frog$HML)

or

mmod <- lm(as.matrix(bear) ~ Mkt.RF + SMB + HML, data=frog)
summary(mmod)

This gives precisely the same coefficients, standard errors, t-values etc. as if you had looped over the columns in bear individually. Doing it this way has multiple advantages, however.

Try, f.ex.

anova(mmod)
coef(mmod)
residuals(mmod)

Very handy.

AkselA
  • 8,153
  • 2
  • 21
  • 34