11

I couldn't find an answer to this anywhere, so here's my solution.

The question is: how can you calculate a power set in R?

It is possible to do this with the library "sets", with the command 2^as.set(c(1,2,3,4)), which yields the output {{}, {1}, {2}, {3}, {4}, {1, 2}, {1, 3}, {1, 4}, {2, 3}, {2, 4}, {3, 4}, {1, 2, 3}, {1, 2, 4}, {1, 3, 4}, {2, 3, 4}, {1, 2, 3, 4}}. However, this uses a recursive algorithm, which is rather slow.


Here's the algorithm I came up with.

It's non-recursive, so it's much faster than some of the other solutions out there (and ~100x faster on my machine than the algorithm in the "sets" package). The speed is still O(2^n).

The conceptual basis for this algorithm is the following:

for each element in the set:
    for each subset constructed so far:
        new subset = (subset + element)

Here's the R code:

EDIT: here's a somewhat faster version of the same concept; my original algorithm is in the third comment to this post. This one is 30% faster on my machine for a set of length 19.

powerset = function(s){
    len = length(s)
    l = vector(mode="list",length=2^len) ; l[[1]]=numeric()
    counter = 1L
    for(x in 1L:length(s)){
        for(subset in 1L:counter){
            counter=counter+1L
            l[[counter]] = c(l[[subset]],s[x])
        }
    }
    return(l)
}

This version saves time by initiating the vector with its final length at the start and keeping track with the "counter" variable of the position at which to save new subsets. It's also possible to calculate the position analytically, but that was slightly slower.

sssheridan
  • 690
  • 1
  • 6
  • 15
  • Now, I'm not sure why, but this other solution is slower than the one I posted. It involves only one list creation, but I guess having to do arithmetic slows it down. Its speed scales as something greater than O(n^2). `powerset1 = function(set){ ps = vector(mode="list",length=length(set)^2); ps[[1]] = numeric(); for(e in 1:length(set)){ f = 2^(e-1); for(subset in 1:f){ ps[[f+subset]] = c(ps[[subset]],set[e]); } } ; return(ps) }` – sssheridan Sep 10 '13 at 10:22
  • duplicate of http://stackoverflow.com/questions/17594392/how-do-i-find-all-possible-subsets-of-a-set-iteratively-in-r ? – Carl Witthoft Sep 10 '13 at 11:18
  • (Original algorithm: powerset = function(set){ ps = list(); ps[[1]] = numeric(); for(element in set){ temp = vector(mode="list",length=length(ps)); for(subset in 1:length(ps)){ temp[[subset]] = c(ps[[subset]],element) }; ps=c(ps,temp) }; return(ps) }; powerset(1:4) The list `temp` is a compromise between the speed costs of doing arithmetic and of creating new lists. – sssheridan Nov 26 '14 at 13:00
  • 1
    `set_power()` in the [sets](http://www.inside-r.org/packages/cran/sets/docs/as.set) library should also work. – Keith Hughitt Aug 19 '15 at 20:31

4 Answers4

17

A subset can be seen as a boolean vector, indicating whether an element is in the subset of not. Those boolean vectors can be seen as numbers written in binary. Enumerating all the subsets of 1:n is therefore equivalent to enumerating the numbers from 0 to 2^n-1.

f <- function(set) { 
  n <- length(set)
  masks <- 2^(1:n-1)
  lapply( 1:2^n-1, function(u) set[ bitwAnd(u, masks) != 0 ] )
}
f(LETTERS[1:4])
Vincent Zoonekynd
  • 31,893
  • 5
  • 69
  • 78
  • This is indeed a more elegant solution than mine. However, I benchmarked it, and it takes 50-70% more time for both small and large sets. – sssheridan Nov 26 '14 at 12:55
  • 2
    I am surprised it takes longer. But, as far as answers on stack go, this answer teaches a lot. Its way outside of the box -in a good way. – mgriebe Aug 16 '15 at 15:26
4

There is a function powerset in the package HapEstXXR which seems to be faster than your function and the function in the other answer. See below (your function is called your.powerset)

require('microbenchmark')
require('HapEstXXR')
microbenchmark(powerset(LETTERS[1:10]), f(LETTERS[1:10]), your.powerset(LETTERS[1:10]), times=100)


Unit: microseconds
                         expr      min        lq    median        uq       max neval
      powerset(LETTERS[1:10])  314.845  388.4225  594.2445  686.6455   857.513   100
             f(LETTERS[1:10]) 7144.132 7937.6040 8222.1330 8568.5120 17805.335   100
 your.powerset(LETTERS[1:10]) 5016.981 5564.2880 5841.9810 6025.0690 29138.763   100

Since powerset seems to be very fast you might want to look at the code of the function in the HapEstXXR package.

user1981275
  • 13,002
  • 8
  • 72
  • 101
  • Yes, it is very fast...because it uses C/C++ code. The speed difference isn't as big for larger powersets, but on my machine, it's still about 2-3x faster, and also scales as O(2^n). The disadvantage is that, for sets larger than 15, it can only output the powerset to a file, not to R. For a set of 15, my function takes ~0.15 sec. – sssheridan Sep 17 '13 at 10:49
2

Here's another simple approach that seems to perform reasonably well for small sets. There are obvious memory issues with this method as the data cardinality increases.

getPowSet <- function(set) {     
  n <- length(set)
  keepBool <- sapply(2^(1:n - 1), function(k) 
    rep(c(FALSE, TRUE), each=k, times=(2^n / (2*k))))
  lapply(1:2^n, function(j) set[keepBool[j, ]])
}

Speed comparison for n=10:

microbenchmark(powerset(LETTERS[1:10]), f(LETTERS[1:10]), getPowSet(LETTERS[1:10]))

Unit: milliseconds
                     expr      min       lq     mean   median       uq      max neval
  powerset(LETTERS[1:10]) 2.466167 2.551928 2.656964 2.581211 2.637358 3.676877   100
         f(LETTERS[1:10]) 2.923339 3.029928 3.115222 3.104413 3.175931 4.033136   100
 getPowSet(LETTERS[1:10]) 2.415290 2.490522 2.574131 2.547115 2.617198 3.664040   100

But then for n=15 the original function seems to perform better:

microbenchmark(powerset(LETTERS[1:15]), f(LETTERS[1:15]), getPowSet(LETTERS[1:15]))

Unit: milliseconds
                     expr       min        lq      mean    median       uq      max neval
  powerset(LETTERS[1:15])  81.48276  88.50272  94.88927  91.61366  94.8262 174.0636   100
         f(LETTERS[1:15]) 110.86208 118.08736 124.38501 122.35872 126.7830 189.3131   100
 getPowSet(LETTERS[1:15])  86.16286  93.32314  98.14936  96.85443 100.6075 159.1030   100
dpritch
  • 21
  • 1
1

The below should create a power set, minus the empty set element.

powerset <- function(x) {
  sets <- lapply(1:(length(x)), function(i) combn(x, i, simplify = F))
  unlist(sets, recursive = F)
}
drj
  • 11
  • 1