2

I'd like to get every combination of 280 elements that sum 274, but every value must be an integer and between 0 and 4.

There is a function that almost do this, that is restrictedparts (which i found here: Getting all combinations which sum up to 100 using R )...but still have to get only elements with value up to 4.

1 Answers1

2

Generally these types of problems can be handled by the partitions package however, I was not able to find a solution using that package. I wouldn't exclude this package altogether just yet, as I have continuously discovered really nice surprises over the past few years. I digress.

First off, the maximum number of positive integers that sum to 274 is 274 (e.g. sum(rep(1, 274))). Thus any solution with more than 274 elements, say n, will be identical except for an additional n - 274 zeros per combination.

As an example demonstrating this, let's say we are looking for every combination of 10 elements that sum to 8 where each element is an integer between 0 and 2. The only solutions are:

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    0    0    0    0    0    0    2    2    2     2
[2,]    0    0    0    0    0    1    1    2    2     2
[3,]    0    0    0    0    1    1    1    1    2     2
[4,]    0    0    0    1    1    1    1    1    1     2
[5,]    0    0    1    1    1    1    1    1    1     1

As you can see, the last row has the maximal number of positive elements (i.e. 8 ones).

This is an important observation because we can greatly reduce the number of combinations to test by capping the number of elements to the desired sum.

The number of combinations with repetition of n elements choose k is given by the binomial coefficient where the top number is n + k - 1 and the bottom number is k:

enter image description here

Thus, for our example we can reduce the number of possible checks by over 20 million:

choose(280 + 5 - 1, 280)
[1] 265368251

choose(274 + 5 - 1, 274)
[1] 243531475

265368251 - 243531475
[1] 21836776

Although we have reduced the space of possibilities, we are still left with a difficult task ahead. Generating all combinations and testing their sum is not likely to produce any solutions in a reasonable time.

Updated answer with newer versions of RcppAlgos

In version 2.3.5 a generalized partition algorithm is implemented which is very efficient. It works with standard combinations with/without repetition as well as multisets:

## Using version 2.4.1
system.time(comb274 <- RcppAlgos::comboGeneral(0:4, m = 274, TRUE, 
                                               constraintFun = "sum",
                                               comparisonFun = "==",
                                               limitConstraints = 274))
 user  system elapsed 
0.420   0.196   0.608

dim(comb274)
[1] 150811    274

It should also be noted that we no longer need to cap the results as we are making use of std::vector::push_back() underneath. This is preferable as STL vectors grow very efficiently and we can now avoid preallocating the maximum number of rows for an Rcpp::Matrix (This was causing the memory exhaustion we see below).

We can still gain efficiency by using a reasonable cap (i.e. setting upper). Doing this in later versions makes calls to std::vector::reserve():

system.time(comb274 <- RcppAlgos::comboGeneral(0:4, m = 274, TRUE, 
                                               constraintFun = "sum",
                                               comparisonFun = "==",
                                               limitConstraints = 274,
                                               upper = 1e6))
 user  system elapsed 
0.225   0.105   0.327

We see about a 10 fold improvement in efficiency over older versions.

Old answer with RcppAlgos prior to v2.3.3

A more reasonable solution is to exclude many combinations without needing to check their sum. By generating combinations in lexicographical order, once a particular combination exceeds the constraint, we can skip many combinations knowing that they too will exceed the constraint. This is exactly what comboGeneral from RcppAlgos (I am the author) does.

library(RcppAlgos) ## v2.3.2

comb274 <- comboGeneral(0:4, m = 274, TRUE, 
                        constraintFun = "sum",
                        comparisonFun = "==",
                        limitConstraints = 274)
Error: vector memory exhausted (limit reached?)

As you can see, this isn't going to work in its raw form as there are simply too many combinations to test (at least on my machine.. you can alter the memory limits in R on macOS R on MacOS Error: vector memory exhausted (limit reached?)).

This is no problem. All we need to do is put a limit on the number of expected results using the upper parameter. I will arbitrarily set it to 1 million. If we get 1 million results, we increase this limit until the number of results is less than our bound.

## Again, this was for v2.3.2
system.time(comb274 <- comboGeneral(0:4, m = 274, TRUE, 
                                    constraintFun = "sum",
                                    comparisonFun = "==",
                                    limitConstraints = 274,
                                    upper = 1e6))
 user  system elapsed 
3.624   0.079   3.705

dim(comb274)
[1] 150811    274

And there you have it! Confirming that every row sums to 274, we have:

all(rowSums(comb274) == 274)
[1] TRUE

If you really need 280 elements, you can either run the above code setting the parameter m to 280 at the cost of efficiency, or simply cbind a matrix of zeros with 150811 rows and 6 columns to comb274.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
Joseph Wood
  • 7,077
  • 2
  • 30
  • 65