0

I am performing a regression using data.table. However, I would like to dynamically specify the right-hand side of the formula as in the following example:

library(data.table)

dt <- data.table(data.frame("a"=seq(1,10,1), 
                            "b"=seq(1,20,2), 
                            "c"=seq(1,30,31),
                            "d"=c("one", "one", "one", "one", "one", 
                                  "two", "two", "two", "two", "two")))
varname <- "a"

dt[, lmtest::coeftest(x = lm(get(varname) ~ b),
                      vcov. = sandwich::NeweyWest(x = lm(get(varname) ~ b)) ), by = "d"]

do.regr <- function(rhs) {
  dt[, lmtest::coeftest(x = lm(get(varname) ~ rhs),
                        vcov. = sandwich::NeweyWest(x = lm(get(varname) ~ rhs)) ), by = "d"]
}

do.regr("b+c")

This gives an error. Is there a way to pass a string and have that being the variables used on the right-hand side of the formula?

Tyler D
  • 323
  • 1
  • 12

3 Answers3

1

You can use reformulate to create formula object.

do.regr <- function(dt, varname, rhs){
  lmtest::coeftest(x = lm(reformulate(rhs, varname), data = dt))
}

do.regr(dt, varname, c('b', 'c'))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thanks! I updated the question to generalize the situation a bit more - does `reformulate` still work? – Tyler D Sep 07 '20 at 11:14
  • It doesn't work as it is if you want to apply this by group. Maybe try Roland's approach or I could think of is splitting by `d` and apply reformulate on each list. – Ronak Shah Sep 07 '20 at 13:03
1

You can compute on the language:

do.regr <- function(rhs) {
  rhs <- parse(text = rhs)[[1]]
  varname <- as.symbol(varname)
  eval(bquote(dt[, lmtest::coeftest(x = lm(.(varname) ~ .(rhs)),
                        vcov. = sandwich::NeweyWest(x = lm(.(varname) ~ .(rhs))) ), by = "d"]))
}

do.regr("b+c")
#works

The only disadvantage is that you couldn't use data.table's dot alias because bquote would (try to) substitute it. You could use substitute instead of bquote if this becomes an issue.

Roland
  • 127,288
  • 10
  • 191
  • 288
1

The OP has asked to pass the right-hand side as a parameter to the do.reg() function but has specified the left-hand side dynamically as well in a variable varname.

Therefore, I suggest to pass both the left-hand side and the right-hand-side as parameters to the do.reg() function.

The approach below picks up Roland's suggestion to compute on the language but uses glue::glue() for string interpolation and a helper function EVAL() which I find more readable, IMHO :

EVAL <- function(...) eval(parse(text = paste0(...)), envir = parent.frame(2))
do.regr <- function(lhs, rhs) {
  EVAL(glue::glue("dt[, lmtest::coeftest(x = lm({lhs} ~ {rhs}),
                        vcov. = sandwich::NeweyWest(x = lm({lhs} ~ {rhs})) ), by = d]"))
}

The helper function EVAL() has been suggested by Matt Dowle in order to create one expression to be evaluated, "similar to constructing a dynamic SQL statement to send to a server".

The function can be called as

do.regr("a", "b+c")

or

do.regr(varname, "b+c")

and returns the same result as

dt[, lmtest::coeftest(x = lm(a ~ b+c),
                      vcov. = sandwich::NeweyWest(x = lm(a ~ b+c)) ), by = d]
      d           V1
 1: one 5.000000e-01
 2: one 5.000000e-01
 3: one 1.046778e-16
 4: one 2.738990e-17
 5: one 4.776564e+15
 6: one 1.825490e+16
 7: one 2.023596e-47
 8: one 3.625202e-49
 9: two 5.000000e-01
10: two 5.000000e-01
11: two 3.289712e-16
12: two 2.416974e-17
13: two 1.519890e+15
14: two 2.068702e+16
15: two 6.281080e-46
16: two 2.491017e-49
Uwe
  • 41,420
  • 11
  • 90
  • 134