1

Suppose we have a dataset with an outcome variable y and 5 covariates. Suppose we want to fit a regression model where y is regressed on each possible combination of covariates. So, since we have 5 covariates we have 5! = 120 regression equations. I've been trying to write a solution to automate this with reformulate() and update():

match_variables <- c("x1", "x2","x3", "x4", "x5")

match_equation <- y ~ x1 

matchvar_list <- lapply(match_variables, function(x, orig = match_equation) {
    new <- reformulate(c(x,'.'))
    update(orig, new)})


matchvar_list
[[1]]
 y ~ x1

[[2]]
 y ~ x2 + x1

[[3]]
 y ~ x3 + x1

[[4]]
y ~ x4 + x1

[[5]]
y ~ x5 + x1

The ultimate goal is to have a list of length 120 where each element is one of the possible combinations of covariates. I'm about 4% of the way there and you can imagine using a brute force approach to close the gap but it seems like there should be a simple modification here that I'm not seeing.

Update###

Actually I made a stupid mistake and the math is wrong. It should be 31 regression equations. y ~ x1 + x2 is the same as y ~ x2 + x1 so we have: choose(5,5) + choose(5,4) + choose(5,3) + choose(5,2) + choose(5,1) = 31

Community
  • 1
  • 1
hubert_farnsworth
  • 797
  • 2
  • 9
  • 21
  • There was some R function that does it, it was sorting the models by AIC afterwards. But unfortunatelly I cannot find it now! Please let me know. – Tomas Nov 25 '13 at 01:24
  • The idea would be able to apply this to a number of packages. For example, out <- lapply(matchvar_list, lm(), data ). You can imagine substituting lm() for anything under the sun – hubert_farnsworth Nov 25 '13 at 01:28
  • Use `combn` (http://stackoverflow.com/questions/7906332/how-to-calculate-combination-and-permutation-in-r) to get indices, select the names using those, then build the formula with `paste` and `as.formula`. – Ari B. Friedman Nov 25 '13 at 02:03
  • remember that also interactions should be tested. It is not so easy. I think there was some function in some package for that, no need to reinvent the wheel. Just find it :) – Tomas Nov 25 '13 at 02:13
  • are you talking about the ``MASS`` package? You can do stepwise regression in this package which does as you describe and it also gives the AIC but this is not really the goal. I'm looking for something general that accepts more functions than just regression – hubert_farnsworth Nov 25 '13 at 02:17
  • 1
    @Tomas, I think you're looking for the `dredge` function from the `MuMIn` package ... – Ben Bolker Nov 25 '13 at 02:50

2 Answers2

1

Here's an elaborated version of my earlier comment:

match_variables <- c("x1", "x2","x3", "x4", "x5")
combos <- sapply( seq(5), function(i) {
  as.list(as.data.frame(combn( x=match_variables, m=i ) ) )
})
combos <- unlist(combos,recursive=FALSE)
forms <- sapply( combos, function(x) as.formula(paste0("y~",paste(x,collapse="+")) ))
> forms[[2]]
y ~ x2
<environment: 0x5c64a58>

The as.list(as.data.frame( bit is just a trick to split a matrix into column vectors. The unlist lops off a level of nested lists that accumulates. Then the as.formula(pasteing puts it all together.

Ari B. Friedman
  • 71,271
  • 35
  • 175
  • 235
  • nice! any idea what's keeping the "" thing there? – hubert_farnsworth Nov 25 '13 at 04:03
  • @hubert_farnsworth I believe that's just R indicating that each formula was created in a particular environment. So that if it references `x` it will (first?) look in the creation environment before moving on up the chain of environments to the global one or anything in between. – Ari B. Friedman Nov 25 '13 at 12:06
1

A far from perfect solution building on the suggestions by @Ari:

require(combinat)
require(roxygen)
match_variables <- c("x1", "x2","x3", "x4", "x5") 
combos <- sapply( seq(5), Curry( combn, x=match_variables ) )

x <- list("y~x1+x2+x3+x4+x5")

for(i in 1:ncol(combos[[1]])) {  
  x <- append(x, paste("y",paste(combos[[1]][,i]), sep = "~"))
}

for(i in 1:ncol(combos[[2]])) {
  x <- append(x, paste("y",paste(combos[[2]][1,i],combos[[2]][2,i],sep="+"),sep = "~"))
}

for(i in 1:ncol(combos[[3]])) {  
  x <- append(x, paste("y",paste(combos[[3]][1,i],combos[[3]][2,i],combos[[3]][3,i],sep="+"),sep="~"))
}

for(i in 1:ncol(combos[[4]])) {  
  x <- append(x,paste("y",paste(combos[[4]][1,i],combos[[4]][2,i],combos[[4]][3,i],combos[[4]][4,i],sep="+"),sep="~"))  
}
hubert_farnsworth
  • 797
  • 2
  • 9
  • 21
  • 1
    Great job @hubert_farnsworth. Might I suggest a `sapply( unlist(x), as.formula )` at the end to convert them all to formulae. Also, I assume you were calling `roxygen` for `Curry`? It's been moved to the `functional` package in the latest versions. – Ari B. Friedman Nov 25 '13 at 12:05