Regression with many variables, but not enough to justify using . and subtracting unnecessary variables

Question

I'm trying to run a regression with roughly 20 variables, in a dataset that has 50 variables. So it looks something like:

lm(data=data, formula = y ~ explanatory_1 + ... + explanatory_20)

Obviously this works fine, but we want the code to look a little cleaner. A lot of answers tell you to use . - however, I don't want to do that, because the dataset has about 20 or so variables that we don't use in the regression. i.e. We'd be subtracting as many variables as we include in the normal regression.

Is there a way to group the explanatory vars into a list, so it can instead look like

lm(data=data, formula = y ~ list)?

Furthermore, in some specifications we include a new covariate that also acts as an interaction term on all the original covariates, so ideally we would have

lm(data=data, formula = y ~ list + new_var + new_var:list).

Can this be done? Thanks!

Subset your data so it only includes the columns you want, then use `y ~ .`. If you want `foo` to interact with everything use `y ~ foo * .` — Gregor Thomas, Nov 10 '21 at 20:58

score 3 · Accepted Answer · answered Nov 10 '21 at 20:59

3

You can put the explanatory variables in a vector and use reformulate

x_vars <- c('cyl', 'disp', 'hp')
lm(data = mtcars, formula = reformulate(x_vars, response = 'mpg'))

answered Nov 10 '21 at 20:59

otheracct

66
2

Thanks! Is it possible to include interactions in reformulate, e.g. something like `lm(data = mtcars, formula = reformulate(c(x_vars, x_vars:'new_var'), response = 'mpg'))` ? – MambaMentality Nov 10 '21 at 21:36
1

@MambaMentality ; see https://stackoverflow.com/questions/4951442/formula-with-dynamic-number-of-variables#13371468 – user20650 Nov 10 '21 at 23:23

Regression with many variables, but not enough to justify using . and subtracting unnecessary variables

1 Answers1