7

I wish to find the speediest way to find up to 1000 possible combinations of 'n' integers to find a target integer.

For example. Say I wanted to sum to the number '20'. I want to find up to 1000 combinations of four integers that sum to this number. The integers can repeat themselves. I also have a condition that the integer must not be smaller than a particular number, in this case 4.

target<-20  #the number I wish to sum to
lowest<-4   #the smallest integer I allow
size<-4 #the number of integers I wish to use to sum
maxposs <- target - ((size-1) * lowest) #given the lowest, this is the max possible integer. In my example it is 8.

This is how I have started to work this out. Using combn to find all combinations of the four chosen integers and then filtering by those that sum to my target.

m <- combn(rep(lowest:maxposs,size), size)
m1<- m[,colSums(m)==target]

Here, 'm1' has 245 columns. There are only this many solutions. The last few columns:

#     [,238] [,239] [,240] [,241] [,242] [,243] [,244] [,245]
#[1,]      4      4      4      4      4      4      5      5
#[2,]      5      5      5      6      7      4      6      4
#[3,]      7      4      5      4      4      5      4      5
#[4,]      4      7      6      6      5      7      5      6

However, in my real application, I can be dealing with very high integers (summing up to 1000) and want to limit myself to a random sample of 1000 possible combinations. As this is for a randomization statistical test, speed is of the essence. I wonder if anyone knows of a faster way of doing this. My way doesn't feel intuitively quick.

jalapic
  • 13,792
  • 8
  • 57
  • 87

1 Answers1

7
my_matrix <- matrix(nrow = 1000, ncol = 4)
i <- 1
nn <- 1000
while(i <= 1000){
  x <- sample(x = 4:nn, size = 3)
  y = nn - sum(x)
  if(y >= 4){
    my_matrix[i, ] <- c(x, y)
    i <- i + 1
  }
}

Per Gavin's suggestion, redone with a preallocated matrix. Now this runs in .158 seconds, twice as fast, and probably scales better.

goodtimeslim
  • 880
  • 7
  • 13
  • 1
    You would make this quicker still if you didn't commit the cardinal sin of growing an object, a data frame no less, to compound the issue!, in a `for` loop. Allocate a matrix of 4 columns and 1000 rows. Fill in the `i`th row if it meets the criterion. For larger problems, the copying of the data frame will just kill performance. – Gavin Simpson Jun 16 '15 at 05:19
  • noted and changed. I'm trying to break bad habits like that, but didn't think too much about it since it only took a half second anyway in this case. – goodtimeslim Jun 16 '15 at 05:30