2

I have a vector of 4 names called var.names. Using combn(), I obtain all unique combinations of these 4 names (e.g., each alone, unique combinations of 2, 3, and all 4 of them) stored in com.names.

How can I make lm() formulas from each column in com.names with a + sign between those that are more than 1 name?

For example, in com.names[[1]][,1], we have only one name ("gear") so my formula will be mpg ~ gear. BUT, in com.names[[2]][,1], we have two names ("gear" "cyl") so my formula will be mpg ~ gear + cyl and so on (overall, we'll have 15 formulas).

This and This answer may be helpful?

var.names <- c("gear", "cyl", "drat", "disp") # from BASE R 'mtcars' dataset

com.names <- lapply(seq_along(var.names), function(i)combn(var.names, i)) # all combinations

# My incomplete attempt:
 lapply(com.names, function(x, d) lm(as.formula("mpg ~ ")   )), data = mtcars) # ???
rnorouzian
  • 7,397
  • 5
  • 27
  • 72

1 Answers1

2

You're close. Let's take your com.names and update it to create combined strings:

# thanks to thelatemail for use of 'combn(..., FUN=)'
com.names <- lapply(seq_along(var.names), function(i) combn(var.names, i, FUN = paste, collapse = " + "))
com.names
# [[1]]
# [1] "gear + cyl + drat + disp"
# [[2]]
# [1] "gear + gear + gear + cyl + cyl + drat"  "cyl + drat + disp + drat + disp + disp"
# [[3]]
# [1] "gear + gear + gear + cyl"  "cyl + cyl + drat + drat"   "drat + disp + disp + disp"
# [[4]]
# [1] "gear" "cyl"  "drat" "disp"

Now we can convert those into formulas (formulæ?) rather directly:

head(lapply(unlist(com.names), function(s) as.formula(paste("mpg ~ ", s))), n=3)
# [[1]]
# mpg ~ gear
# <environment: 0x0000000053e86950>
# [[2]]
# mpg ~ cyl
# <environment: 0x0000000031730970>
# [[3]]
# mpg ~ drat
# <environment: 0x0000000032c38d58>

From there, it's just using it within lm:

head(lapply(unlist(com.names), function(s) lm(as.formula(paste("mpg ~ ", s)), data=mtcars)), n=2)
# [[1]]
# Call:
# lm(formula = as.formula(paste("mpg ~ ", s)), data = mtcars)
# Coefficients:
# (Intercept)         gear  
#       5.623        3.923  
# [[2]]
# Call:
# lm(formula = as.formula(paste("mpg ~ ", s)), data = mtcars)
# Coefficients:
# (Intercept)          cyl  
#      37.885       -2.876  
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • 1
    `combn` also has a `FUN=` argument so you can avoid the `apply` - `lapply(seq_along(var.names), function(i) combn(var.names, i, FUN=paste, collapse="+"))` - you could even jam the `lm` into that process and do away with creating an intermediate object at all. – thelatemail Nov 08 '19 at 05:58
  • I'll replace the first, and let the OP play with compacting it further. Thanks! – r2evans Nov 08 '19 at 06:02
  • 1
    And while I'm at it, the `as.formula` isn't strictly necessary either - `lapply(unlist(com.names), function(s) lm(paste("mpg ~ ", s), data=mtcars))` will work. Since `?lm` says - *an object of class "formula" (or one that can be coerced to that class):* – thelatemail Nov 08 '19 at 06:04
  • Beware of potential scoping issues if you use `as.formula` outside model functions like `lm` (like in the second code block). In that case always use the `env` parameter. – Roland Nov 08 '19 at 07:22
  • Good point, thanks @Roland. – r2evans Nov 08 '19 at 07:28