1

I want to find a matrix with all possible combinations of 0's and 1's. The conditions of these possible combinations are no reputation of individual possibilities, and for each possible vector has the specified number of 1's. For example, I have a number of object n = 6, and a number of samples r = 3, which means 6 slots, in each slot, (possible combination), there is a number of 1's = 3. Using the choose() function in R, we can find the number of possibilities which is 20.

choose(n=6,k=3) #calculate the number of combinations without replacement/repetition

The desirable output matrix of all possible combination as below:

1, 1 1 1 0 0 0
2, 1 1 0 1 0 0
3, 1 1 0 0 1 0
4, 1 1 0 0 0 1
5, 1 0 1 1 0 0
6, 1 0 1 0 1 0
7, 1 0 1 0 0 1
8, 0 1 1 1 0 0 
9, 0 1 1 0 1 0
10,0 1 1 0 0 1
11,0 0 1 1 1 0
12,0 0 1 1 0 1
14,0 0 0 1 1 1 
15,1 0 0 1 1 0
16,0 1 0 1 1 0 
17,1 0 0 1 0 1
18,1 0 0 0 1 1 

These possiblities should be equal to 20, howover, I found 18 only. I will apply this concept for a large amount of data set, such as instead of 6 slots, and 3 1's it will be 200 slots and 100 1's, respectively. Therefore, I need an algorithm or a built-in function in R to give me the output. Thanks.

yaseen
  • 25
  • 6
  • 1
    These are not combinations. They are permutations since order matters. Also note, that for your real data set, you will probably have to change your approach as the number of results is huge: `RcppAlgos::permuteCount(0:1, freqs = c(100, 100)) = 9.054851e+58`. – Joseph Wood Jul 05 '18 at 12:09

2 Answers2

3
t(combn(6,3,function(x)replace(numeric(6),x,1)))
      [,1] [,2] [,3] [,4] [,5] [,6]
 [1,]    1    1    1    0    0    0
 [2,]    1    1    0    1    0    0
 [3,]    1    1    0    0    1    0
 [4,]    1    1    0    0    0    1
 [5,]    1    0    1    1    0    0
 [6,]    1    0    1    0    1    0
 [7,]    1    0    1    0    0    1
 [8,]    1    0    0    1    1    0
 [9,]    1    0    0    1    0    1
[10,]    1    0    0    0    1    1
[11,]    0    1    1    1    0    0
[12,]    0    1    1    0    1    0
[13,]    0    1    1    0    0    1
[14,]    0    1    0    1    1    0
[15,]    0    1    0    1    0    1
[16,]    0    1    0    0    1    1
[17,]    0    0    1    1    1    0
[18,]    0    0    1    1    0    1
[19,]    0    0    1    0    1    1
[20,]    0    0    0    1    1    1

You can write a function:

fun=function(n,m)t(combn(n,m,function(x)replace(numeric(n),x,1)))
fun(6,3
Onyambu
  • 67,392
  • 3
  • 24
  • 53
  • Thanks Onyambu, Awesome, work perfectly and very quick answer. It seems a very clever answer, so I need to understand how it is work, I mean each part of the function. – yaseen Jul 05 '18 at 12:01
  • start with `combn(6,3)` then you will see what happens – Onyambu Jul 05 '18 at 12:08
1

This is simply permutations of the multiset 0:1. There are a couple of libraries capable for handling these efficiently: RcppAlgos (I am the author) and arrangements.

RcppAlgos::permuteGeneral(1:0, freqs = c(3, 3))

arrangements::permutations(x = 1:0, freq = c(3, 3))

Both give the desired result. You will note that the vector passed is in descending order (i.e. 1:0). This is so, because both of the libraries produce their output in lexicographical order.

As noted in the comments, for your real data none of the posted solutions will work as the number of results is way too big.

RcppAlgos::permuteCount(0:1, freqs = c(100,100))
[1] 9.054851e+58

arrangements::npermutations(x = 0:1, freq = c(100, 100), bigz = TRUE)
Big Integer ('bigz') :
[1] 90548514656103281165404177077484163874504589675413336841320

Since generating this amount of data at one time is simply not feasible, both of the packages, arrangements and RcppAlgos, offer alternative approaches that will allow one to tackle larger problems.

arrangements

For the package arrangements, you can set up an iterator that allows the user to generate combinations/permutations n at a time avoiding the overhead of generating all of them.

library(arrangements)
iperm <- ipermutations(x = 1:0, freq = c(3,3))

## get the first 5 permutations
iperm$getnext(d = 5)
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    1    1    0    0    0
[2,]    1    1    0    1    0    0
[3,]    1    1    0    0    1    0
[4,]    1    1    0    0    0    1
[5,]    1    0    1    1    0    0

## get the next 5 permutations
iperm$getnext(d = 5)
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    0    1    0    1    0
[2,]    1    0    1    0    0    1
[3,]    1    0    0    1    1    0
[4,]    1    0    0    1    0    1
[5,]    1    0    0    0    1    1


RcppAlgos

For RcppAlgos, there are arguments lower and upper that allow for generations of specific chunks.

library(RcppAlgos)
permuteGeneral(1:0, freqs = c(3,3), lower = 1, upper = 5)
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    1    1    0    0    0
[2,]    1    1    0    1    0    0
[3,]    1    1    0    0    1    0
[4,]    1    1    0    0    0    1
[5,]    1    0    1    1    0    0

permuteGeneral(1:0, freqs = c(3,3), lower = 6, upper = 10)
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    0    1    0    1    0
[2,]    1    0    1    0    0    1
[3,]    1    0    0    1    1    0
[4,]    1    0    0    1    0    1
[5,]    1    0    0    0    1    1

As these chunks are generated independently, one can easily generate and analyze in parallel:

library(parallel)
mclapply(seq(1,20,5), function(x) {
    a <- permuteGeneral(1:0, freqs = c(3,3), lower = x, upper = x + 4)
    ## Do some analysis
}, mc.cores = detectCores() - 1)

You will not notice any speedup for this small example, but there is a noticeable gain as the number of results get large.

There is a lot more information on this topic in a summary I wrote to the question: R: Permutations and combinations with/without replacement and for distinct/non-distinct items/multiset.

Joseph Wood
  • 7,077
  • 2
  • 30
  • 65