Different Combinations of 10 Variables but no Set Amount of Required Variables

Question

So I am attempting to find all combinations of 10 different variables for regressions I'm running. For each regression, not all 10 variables have to be present. Some regressions with only have 2 or 3 variables present while others will have 7 or 8 present. For example, the following could be a few possible sets:

    a b c d e f g h i j
    a b c d
    e f g
    i j 
    a f g j

The order of the variables isn't important and there cannot be duplicates of a variable within a combination. Does anybody know a good way to generate all possible combinations of 10 variables under these specific terms?

See http://www.sthda.com/english/articles/37-model-selection-essentials-in-r/155-best-subsets-regression-essentials-in-r/ — G. Grothendieck, Nov 12 '19 at 23:36
You are looking for the [power set](https://en.wikipedia.org/wiki/Power_set). See `library(rje); ?powerSet` — Joseph Wood, Nov 13 '19 at 00:18

score 0 · Answer 1 · answered Nov 12 '19 at 23:31

Using the base of combn, we'll iterate using lapply/Map to shape things well.

I'll use up to five levels:

some <- lapply(1:5, combn, x=5)
some
# [[1]]
#      [,1] [,2] [,3] [,4] [,5]
# [1,]    1    2    3    4    5
# [[2]]
#      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,]    1    1    1    1    2    2    2    3    3     4
# [2,]    2    3    4    5    3    4    5    4    5     5
# [[3]]
#      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,]    1    1    1    1    1    1    2    2    2     3
# [2,]    2    2    2    3    3    4    3    3    4     4
# [3,]    3    4    5    4    5    5    4    5    5     5
# [[4]]
#      [,1] [,2] [,3] [,4] [,5]
# [1,]    1    1    1    1    2
# [2,]    2    2    2    3    3
# [3,]    3    3    4    4    4
# [4,]    4    5    5    5    5
# [[5]]
#      [,1]
# [1,]    1
# [2,]    2
# [3,]    3
# [4,]    4
# [5,]    5

Since they have different numbers of rows, we need to normalize them.

some2 <- Map(function(m, nr, nc) rbind(m, matrix(NA, nr=nr-nrow(m), nc=ncol(m))), some, 5, 10)
some2
# [[1]]
#      [,1] [,2] [,3] [,4] [,5]
# [1,]    1    2    3    4    5
# [2,]   NA   NA   NA   NA   NA
# [3,]   NA   NA   NA   NA   NA
# [4,]   NA   NA   NA   NA   NA
# [5,]   NA   NA   NA   NA   NA
# [[2]]
#      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,]    1    1    1    1    2    2    2    3    3     4
# [2,]    2    3    4    5    3    4    5    4    5     5
# [3,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
# [4,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
# [5,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
# [[3]]
#      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,]    1    1    1    1    1    1    2    2    2     3
# [2,]    2    2    2    3    3    4    3    3    4     4
# [3,]    3    4    5    4    5    5    4    5    5     5
# [4,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
# [5,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
# [[4]]
#      [,1] [,2] [,3] [,4] [,5]
# [1,]    1    1    1    1    2
# [2,]    2    2    2    3    3
# [3,]    3    3    4    4    4
# [4,]    4    5    5    5    5
# [5,]   NA   NA   NA   NA   NA
# [[5]]
#      [,1]
# [1,]    1
# [2,]    2
# [3,]    3
# [4,]    4
# [5,]    5

From here, just binding them into one so that you can iterate over all combinations:

out <- do.call(cbind, some2)
out
#      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16]
# [1,]    1    2    3    4    5    1    1    1    1     2     2     2     3     3     4     1
# [2,]   NA   NA   NA   NA   NA    2    3    4    5     3     4     5     4     5     5     2
# [3,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA    NA    NA    NA    NA    NA     3
# [4,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA    NA    NA    NA    NA    NA    NA
# [5,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA    NA    NA    NA    NA    NA    NA
#      [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30] [,31]
# [1,]     1     1     1     1     1     2     2     2     3     1     1     1     1     2     1
# [2,]     2     2     3     3     4     3     3     4     4     2     2     2     3     3     2
# [3,]     4     5     4     5     5     4     5     5     5     3     3     4     4     4     3
# [4,]    NA    NA    NA    NA    NA    NA    NA    NA    NA     4     5     5     5     5     4
# [5,]    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA     5

Granted, you don't strictly need to combine them into one matrix here: you can iterate over each column of each of the some list.

With this code, you can easily replace your own levels:

mylevels <- c("fee", "fie", "foe", "fum", "quux")
out[] <- mylevels[out]
out
#      [,1]  [,2]  [,3]  [,4]  [,5]   [,6]  [,7]  [,8]  [,9]   [,10] [,11] [,12]  [,13] [,14] 
# [1,] "fee" "fie" "foe" "fum" "quux" "fee" "fee" "fee" "fee"  "fie" "fie" "fie"  "foe" "foe" 
# [2,] NA    NA    NA    NA    NA     "fie" "foe" "fum" "quux" "foe" "fum" "quux" "fum" "quux"
# [3,] NA    NA    NA    NA    NA     NA    NA    NA    NA     NA    NA    NA     NA    NA    
# [4,] NA    NA    NA    NA    NA     NA    NA    NA    NA     NA    NA    NA     NA    NA    
# [5,] NA    NA    NA    NA    NA     NA    NA    NA    NA     NA    NA    NA     NA    NA    
#      [,15]  [,16] [,17] [,18]  [,19] [,20]  [,21]  [,22] [,23]  [,24]  [,25]  [,26] [,27] 
# [1,] "fum"  "fee" "fee" "fee"  "fee" "fee"  "fee"  "fie" "fie"  "fie"  "foe"  "fee" "fee" 
# [2,] "quux" "fie" "fie" "fie"  "foe" "foe"  "fum"  "foe" "foe"  "fum"  "fum"  "fie" "fie" 
# [3,] NA     "foe" "fum" "quux" "fum" "quux" "quux" "fum" "quux" "quux" "quux" "foe" "foe" 
# [4,] NA     NA    NA    NA     NA    NA     NA     NA    NA     NA     NA     "fum" "quux"
# [5,] NA     NA    NA    NA     NA    NA     NA     NA    NA     NA     NA     NA    NA    
#      [,28]  [,29]  [,30]  [,31] 
# [1,] "fee"  "fee"  "fie"  "fee" 
# [2,] "fie"  "foe"  "foe"  "fie" 
# [3,] "fum"  "fum"  "fum"  "foe" 
# [4,] "quux" "quux" "quux" "fum" 
# [5,] NA     NA     NA     "quux"

(My assumption is that you will safely ignore NA values in whatever function you're using to regress on these levels.)

score 0 · Answer 2 · answered Nov 12 '19 at 23:32

n <- 3
vars <- letters[1:n]
vars
#> [1] "a" "b" "c"

library(purrr)
1:n %>% 
  map(~combn(vars, ., simplify = FALSE)) %>% 
  flatten()
#> [[1]]
#> [1] "a"
#> 
#> [[2]]
#> [1] "b"
#> 
#> [[3]]
#> [1] "c"
#> 
#> [[4]]
#> [1] "a" "b"
#> 
#> [[5]]
#> [1] "a" "c"
#> 
#> [[6]]
#> [1] "b" "c"
#> 
#> [[7]]
#> [1] "a" "b" "c"

combn(vars, m, simplify = FALSE) gives us all the combinations of size m for any given m, 0 <= m <= n. So we iterate over m from 1 to n to get what is needed.

Different Combinations of 10 Variables but no Set Amount of Required Variables

2 Answers2