0

I'm trying to run a regression with roughly 20 variables, in a dataset that has 50 variables. So it looks something like:

lm(data=data, formula = y ~ explanatory_1 + ... + explanatory_20)

Obviously this works fine, but we want the code to look a little cleaner. A lot of answers tell you to use . - however, I don't want to do that, because the dataset has about 20 or so variables that we don't use in the regression. i.e. We'd be subtracting as many variables as we include in the normal regression.

Is there a way to group the explanatory vars into a list, so it can instead look like

lm(data=data, formula = y ~ list)?

Furthermore, in some specifications we include a new covariate that also acts as an interaction term on all the original covariates, so ideally we would have

lm(data=data, formula = y ~ list + new_var + new_var:list).

Can this be done? Thanks!

  • Subset your data so it only includes the columns you want, then use `y ~ .`. If you want `foo` to interact with everything use `y ~ foo * .` – Gregor Thomas Nov 10 '21 at 20:58

1 Answers1

3

You can put the explanatory variables in a vector and use reformulate

x_vars <- c('cyl', 'disp', 'hp')
lm(data = mtcars, formula = reformulate(x_vars, response = 'mpg'))
otheracct
  • 66
  • 2
  • Thanks! Is it possible to include interactions in reformulate, e.g. something like `lm(data = mtcars, formula = reformulate(c(x_vars, x_vars:'new_var'), response = 'mpg'))` ? – MambaMentality Nov 10 '21 at 21:36
  • 1
    @MambaMentality ; see https://stackoverflow.com/questions/4951442/formula-with-dynamic-number-of-variables#13371468 – user20650 Nov 10 '21 at 23:23