1

My question is almost addressed perfectly in the following post.

Original Post: R - generate all possible pairwise combinations of binary vectors

However, I have an additional condition to add which will invalidate some of the solutions and I need to remove them. For example, consider following 6 pairwise output:

      [,1] [,2] [,3]
[1,]    1    0    0
[2,]    0    1    0  

[1,]    1    0    0
[2,]    0    0    1

[1,]    0    1    0
[2,]    1    0    0

[1,]    0    1    0
[2,]    0    0    1

[1,]    0    0    1
[2,]    1    0    0

[1,]    0    0    1
[2,]    0    1    0

In my problem, 3rd,5th and 6th pair needs to be removed as invalid. The condition is, a following vector can not have 1 in a position that is earlier than the previous vector. If in the first vector, there is a 1 in the 2nd position, then in the second vector, 1 can be either in 2nd or 3rd position but NOT IN first.

Is this possible to implement it in the solution posted in the original post? Is it possible to have fast solution for this as I need to work with large number of combinations?

Rel_Ai
  • 581
  • 2
  • 11

2 Answers2

2

You could replace the nth element of a vector out of zeros with 1.

FUN <- function(m, n, ...) {
  combn(n, m, function(i, ...) t(sapply(i, function(j, ...) `[<-`(rep(0, n), j, 1))), ...)
}
FUN(2, 3, simplify=FALSE)
# [[1]]
#       [,1] [,2] [,3]
# [1,]    1    0    0
# [2,]    0    1    0
# 
# [[2]]
#       [,1] [,2] [,3]
# [1,]    1    0    0
# [2,]    0    0    1
# 
# [[3]]
#      [,1] [,2] [,3]
# [1,]    0    1    0
# [2,]    0    0    1

The dots are used to loop through an optional simplify=FALSE argument. If you leave it out you get an array. Don't know what you'd prefer, you could set one as default.

FUN(2, 3)
# , , 1
# 
#      [,1] [,2] [,3]
# [1,]    1    0    0
# [2,]    0    1    0
# 
# , , 2
# 
#      [,1] [,2] [,3]
# [1,]    1    0    0
# [2,]    0    0    1
# 
# , , 3
# 
#      [,1] [,2] [,3]
# [1,]    0    1    0
# [2,]    0    0    1

This also works with more rows and columns.

FUN(8, 10, simplify=FALSE)
# [[1]]
#      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,]    1    0    0    0    0    0    0    0    0     0
# [2,]    0    1    0    0    0    0    0    0    0     0
# [3,]    0    0    1    0    0    0    0    0    0     0
# [4,]    0    0    0    1    0    0    0    0    0     0
# [5,]    0    0    0    0    1    0    0    0    0     0
# [6,]    0    0    0    0    0    1    0    0    0     0
# [7,]    0    0    0    0    0    0    1    0    0     0
# [8,]    0    0    0    0    0    0    0    1    0     0
# 
# [[2]]
#      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,]    1    0    0    0    0    0    0    0    0     0
# [2,]    0    1    0    0    0    0    0    0    0     0
# [3,]    0    0    1    0    0    0    0    0    0     0
# [4,]    0    0    0    1    0    0    0    0    0     0
# [5,]    0    0    0    0    1    0    0    0    0     0
# [6,]    0    0    0    0    0    1    0    0    0     0
# [7,]    0    0    0    0    0    0    1    0    0     0
# [8,]    0    0    0    0    0    0    0    0    1     0
# ...

EDIT 1

If you want duplicate rows as valid matrices you could use RcppAlgos::permuteGeneral and check if the differences are all greater than or equal to zero.

FUN2 <- function(m, n) {
  v <- RcppAlgos::permuteGeneral(n, m, rep=T)
  v <- as.data.frame(t(v[apply(v, 1, function(x) all(diff(x) >= 0)), ]))
  unname(lapply(v, function(j) t(sapply(j, function(k) `[<-`(rep(0, n), k, 1)))))
}
FUN2(2, 3)
# [[1]]
#      [,1] [,2] [,3]
# [1,]    1    0    0
# [2,]    1    0    0
# 
# [[2]]
#       [,1] [,2] [,3]
# [1,]    1    0    0
# [2,]    0    1    0
# 
# [[3]]
#       [,1] [,2] [,3]
# [1,]    1    0    0
# [2,]    0    0    1
# 
# [[4]]
#       [,1] [,2] [,3]
# [1,]    0    1    0
# [2,]    0    1    0
# 
# [[5]]
#       [,1] [,2] [,3]
# [1,]    0    1    0
# [2,]    0    0    1
# 
# [[6]]
#       [,1] [,2] [,3]
# [1,]    0    0    1
# [2,]    0    0    1

And it's fast!

system.time(FUN2(5, 10))
# user  system elapsed 
# 1.31    0.00    1.40 

Note, that there's also a RcppAlgos::comboGeneral function that is similar to base combn but probably faster.

EDIT 2

We can make it even faster using matrixStats::rowDiffs.

FUN3 <- function(m, n) {
  v <- RcppAlgos::permuteGeneral(n, m, rep=T)
  v <- as.data.frame(t(v[apply(matrixStats::rowDiffs(v) >= 0, 1, all), ]))
  unname(lapply(v, function(j) t(sapply(j, function(k) `[<-`(rep(0, n), k, 1)))))
}
system.time(FUN3(6, 11))
# user  system elapsed 
# 3.80    0.03    3.96 
jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • I think your solution is more complete and working well. One concern is the speed as for FUN2(5,10) it took about 29 seconds. – Rel_Ai Dec 12 '20 at 15:46
  • 1
    Yeah you seem to be right, comboGeneral took only 1.21sec. Let me play around with your code a bit but it seems like a great solution you provided here. Thank you – Rel_Ai Dec 12 '20 at 15:50
  • 1
    @Rel_Ai See my edit to `FUN2`, it's better to calculate `diff` before creating the matrices, now it's pretty fast :) – jay.sf Dec 12 '20 at 16:06
  • There are couple of interesting things. 1) I can't go beyond m=11 when n=5. although the number of combinations are not many (only 1365 for this). 2) FUN3(10,5):12.59sec (1001 combinations). but FUN3(5,20):4.49sec (42504) combinations. When m>n, function seems to perform poorly. Do you see any way out of this? Thanks – Rel_Ai Dec 15 '20 at 12:06
  • @Rel_Ai I think that's a computation issue. You may check the number of permutations using e.g. `RcppAlgos::permuteCount(6, 12, repetition=TRUE)` and draw your own conclusions. – jay.sf Dec 15 '20 at 12:22
1

You can get all such unique combinations in a list with a one-liner in base R:

lapply(as.data.frame(combn(3, 2)), function(x) +rbind(1:3 == x[1], 1:3 == x[2]))
#> $V1
#>      [,1] [,2] [,3]
#> [1,]    1    0    0
#> [2,]    0    1    0
#> 
#> $V2
#>      [,1] [,2] [,3]
#> [1,]    1    0    0
#> [2,]    0    0    1
#> 
#> $V3
#>      [,1] [,2] [,3]
#> [1,]    0    1    0
#> [2,]    0    0    1

And this works for any reasonable length of vector. For example, length 4:

lapply(as.data.frame(combn(4, 2)), function(x) +rbind(1:4 == x[1], 1:4 == x[2]))
#> $V1
#>      [,1] [,2] [,3] [,4]
#> [1,]    1    0    0    0
#> [2,]    0    1    0    0
#> 
#> $V2
#>      [,1] [,2] [,3] [,4]
#> [1,]    1    0    0    0
#> [2,]    0    0    1    0
#> 
#> $V3
#>      [,1] [,2] [,3] [,4]
#> [1,]    1    0    0    0
#> [2,]    0    0    0    1
#> 
#> $V4
#>      [,1] [,2] [,3] [,4]
#> [1,]    0    1    0    0
#> [2,]    0    0    1    0
#> 
#> $V5
#>      [,1] [,2] [,3] [,4]
#> [1,]    0    1    0    0
#> [2,]    0    0    0    1
#> 
#> $V6
#>      [,1] [,2] [,3] [,4]
#> [1,]    0    0    1    0
#> [2,]    0    0    0    1

EDIT

A general solution for an arbitrary number of vectors of arbitrary length would be:

get_unique <- function(n_vectors, length)
{
  df <- as.data.frame(combn(length, n_vectors))
  lapply(df, function(x) {
    +do.call(rbind, lapply(x, function(i) seq(length) == i))
  })
}

Or, if repeats are allowed:

get_unique <- function(n_vectors, length)
{
  df <- as.data.frame(cbind(combn(length, n_vectors), 
                            matrix(rep(seq(length), each = n_vectors), 
                                   ncol = length)))
  lapply(df, function(x) {
    +do.call(rbind, lapply(x, function(i) seq(length) == i))
  })
}

Created on 2020-12-12 by the reprex package (v0.3.0)

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
  • Thank you for your solution. I am actually looking for a general approach where I can take any number and length of vectors. for example 5 vectors of length 10 or 3 vectors of length 8, something like that. I was trying to change yours but could not find a way out. – Rel_Ai Dec 12 '20 at 13:16
  • 1
    @Rel_Ai see my update. The function can be used for an arbitrary number of vectors of arbitrary length – Allan Cameron Dec 12 '20 at 13:27
  • For the originally posted problem your solution works fine. However I was wondering if it is possible to add replacement? For example, get_unique(2,3) should also return [(1,0,0):(1,0,0)], [(0,1,0):(0,1,0)], [(0,0,1):(0,0,1)] as valid combinations? – Rel_Ai Dec 12 '20 at 15:08
  • I can see a problem but not sure how to present it in the comment. First problem: it's missing some valid solutions (try with get_unique(3,4)) 2nd problem: It does not work when n_vectors >= length. – Rel_Ai Dec 12 '20 at 15:27