7

For long and repeating models I want to create a "macro" (so called in Stata and there accomplished with global var1 var2 ...) which contains the regressors of the model formula.

For example from

library(car)
lm(income ~ education + prestige, data = Duncan)

I want something like:

regressors <- c("education", "prestige")
lm(income ~ @regressors, data = Duncan)  

I could find is this approach. But my application on the regressors won't work:

reg = lm(income ~ bquote(y ~ .(regressors)), data = Duncan)

as it throws me:

Error in model.frame.default(formula = y ~ bquote(.y ~ (regressors)), data =
Duncan,  :  invalid type (language) for variable 'bquote(.y ~ (regressors))'

Even the accepted answer of same question:

lm(formula(paste('var ~ ', regressors)), data = Duncan)

strikes and shows me:

Error in model.frame.default(formula = formula(paste("var ~ ", regressors)),
: object is not a matrix`. 

And of course I tried as.matrix(regressors) :)

So, what else can I do?

jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • Just use the other answer at that question. I'm not sure how easy bquote is going to be to adapt to a variable number of covariates. – joran Oct 07 '17 at 01:11
  • @joran: Did not work either, I stated the error message in my question which I also made a bit more reproducible now. – jay.sf Oct 07 '17 at 01:42
  • Several demonstrations below of how the answer I referred to works. Don't give up so easily! ;) – joran Oct 07 '17 at 03:35
  • @joran I'm hanging on! – jay.sf Oct 07 '17 at 09:47
  • 1
    BTW since we shouldn't call it "macro" how do we call it in R? – jay.sf Oct 07 '17 at 09:48
  • 1
    FWIW, using a `global` like this to hold variable names is widely deprecated as bad style in the Stata community. There are many better ways to pass lists of names between programs or commands or other code chunks on need to know basis. – Nick Cox Oct 11 '17 at 08:13
  • jaySf: *"list of (string) names of regressors"*, I expect. You generally wouldn't need it to be global, though. – smci Mar 09 '18 at 21:39

2 Answers2

8

For the scenario you described, where regressors is in the global environment, you could use:

lm(as.formula(paste("income~", paste(regressors, collapse="+"))), data = 
Duncan)

Alternatively, you could use a function:

modincome <- function(regressors){
    lm(as.formula(paste("income~", paste(regressors, collapse="+"))), data = 
Duncan)  
}

modincome(c("education", "prestige"))
Julia Wilkerson
  • 581
  • 4
  • 4
8

Here are some alternatives. No packages are used in the first 3.

1) reformulate

fo <- reformulate(regressors, response = "income")
lm(fo, Duncan)

or you may wish to write the last line as this so that the formula that is shown in the output looks nicer:

do.call("lm", list(fo, quote(Duncan)))

in which case the Call: line of the output appears as expected, namely:

Call:
lm(formula = income ~ education + prestige, data = Duncan)

2) lm(dataframe)

lm( Duncan[c("income", regressors)] )

The Call: line of the output look like this:

Call:
lm(formula = Duncan[c("income", regressors)])

but we can make it look exactly as in the do.call solution in (1) with this code:

fo <- formula(model.frame(income ~., Duncan[c("income", regressors)]))
do.call("lm", list(fo, quote(Duncan)))

3) dot

An alternative similar to that suggested by @jenesaisquoi in the comments is:

lm(income ~., Duncan[c("income", regressors)])

The approach discussed in (2) to the Call: output also works here.

4) fn$ Prefacing a function with fn$ enables string interpolation in its arguments. This solution is nearly identical to the desired syntax shown in the question using $ in place of @ to perform substitution and the flexible substitution could readily extend to more complex scenarios. The quote(Duncan) in the code could be written as just Duncan and it will still run but the Call: shown in the lm output will look better if you use quote(Duncan).

library(gsubfn)

rhs <- paste(regressors, collapse = "+")
fn$lm("income ~ $rhs", quote(Duncan))

The Call: line looks almost identical to the do.call solutions above -- only spacing and quotes differ:

Call:
lm(formula = "income ~ education+prestige", data = Duncan)

If you wanted it absolutely the same then:

fo <- fn$formula("income ~ $rhs")
do.call("lm", list(fo, quote(Duncan)))
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341