0

I want to pass a comma separated vector i've manually created directly to a glm model. The model requires predictors to be separated by a plus sign + so I was wondering if there was a clever way to replace the ,s with +s as I pass the vectors to the model?

For example, say I have these two vectors:

fruits <- c('apples', 'bananas', 'pears', 'apricots')
colors <- c('blue', 'red', 'orange', 'purple')

At the moment, i'm just copying the predictors and adding in + signs manually. E.g.

glm(dependent_var ~ apples + bananas + pears + apricots + blue + red + orange + purple, data = df, family = "binomial")

What i'd love to do is find a way to make this less manual. E.g. is there a way I can basically just copy in the vector names themselves? Something like

glm(dependent_var ~ fruits + colors, data = df, family = "binomial")
C.Robin
  • 1,085
  • 1
  • 10
  • 23
  • 2
    Does this answer your question? [How to succinctly write a formula with many variables from a data frame?](https://stackoverflow.com/questions/5251507/how-to-succinctly-write-a-formula-with-many-variables-from-a-data-frame) – Godrim Nov 16 '21 at 10:46
  • I think the second (unaccepted) answer in this question is practically similar to deschen's below -- it also includes the dependent variable in the formula call though -- would be great to exclude that. The best answer for my case is G. Grothendieck's suggestion to use `reformulate` which, while included in one of the answers to that question is buried among others – C.Robin Nov 16 '21 at 10:50
  • fair enough, but i think there was value in keeping it open ^^ the question Godrim shared was specifically about including all variables in a df as the predictors in a model. that's a special case of the more general question i've posed -- which arguably has greater value to folks learning R – C.Robin Nov 16 '21 at 10:56

3 Answers3

2

1) Use reformulate:

fo <- reformulate(c(fruits, colors), "dep_var"); fo
## dep_var ~ apples + bananas + pears + apricots + blue + red + 
##     orange + purple

glm(fo, data = df, family = "binomial")   

Note that if you pass a variable, fo, to glm then the Call: line of the output will show literally fo.

fo <- reformulate("Time", "demand")
glm(fo, data = BOD)
## 
## Call:  glm(formula = fo, data = BOD)
## ...

To get it to show the contents of fo but not the contents of BOD use do.call and quote like this:

do.call("glm", list(fo, data = quote(BOD)))
##
## Call:  glm(formula = demand ~ Time, data = BOD)

or alternately assign fo back into the "glm" object:

fm <- glm(fo, data = BOD)
fm$call[[2]] <- fo
fm
##
## Call:  glm(formula = demand ~ Time, data = BOD)

2) Another possibility is:

glm(dep_var ~., data = df[c("dep_var", fruits, colors)], family = "binomial")

If "dep_var", fruits and colors are the only columns in df then that can be shortened to:

glm(dep_var ~., data = df, family = "binomial")
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
1

You can paste together your model inputs and covnert it to a formula object.

Here's an example:

head(mtcars)
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

x1 <- c("cyl", "disp")
x2 <- c("drat", "wt")
y <- "mpg"

my_formula <- as.formula(paste0(y, "~", paste0(c(x1, x2), collapse = "+")))

# "my_formula" gives:
# mpg ~ cyl + disp + drat + wt

lm(my_formula, data = mtcars)

Call:
lm(formula = my_formula, data = mtcars)

Coefficients:
(Intercept)          cyl         disp         drat           wt  
  41.160271    -1.786074     0.007472    -0.010492    -3.638075  
deschen
  • 10,012
  • 3
  • 27
  • 50
  • This looks great deschen. Thanks. Does `my_formula` need to include the dependent variable as well? Ideally i'd like to still manually add that in to the model myself – C.Robin Nov 16 '21 at 10:46
  • I think so, yes. But it shouldn't make a difference for you, because you need to provide the DV anyway, either in the model call directly or in the formula creation. And the formula creation would give you more flexibility. So at one place you are defining your IV and DV, create teh formula and paste that into the model call. – deschen Nov 16 '21 at 10:49
  • True. My thinking was more that the code would be more readable to less technical audiences if the DV was spelled out each time I run the model – C.Robin Nov 16 '21 at 10:50
1

Perhaps it works if you combine the call to glm with paste0:

Data:

fruits <- c('apples', 'bananas', 'pears', 'apricots')
colors <- c('blue', 'red', 'orange', 'purple')

Step 1: collapse fruits and colors connecting the elements with + :

fruits.1 <- paste0(fruits, collapse = " + ")
colors.1 <- paste0(colors, collapse = " + ")

Step 2: feed fruits.1 and colors.1 into glm using paste0:

glm(paste0("dependent_var ~ ",fruits.1, " + ", colors.1), data = df, family = "binomial")
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34