1

Background - I want to try and exhaustively search a set of all possible combinations of 250 rows taken 10 at a time. In order to iteratively get this, I use the following code

`
## Function definition
gen.next.cbn <- function(cbn, n){
  ## Generates the combination that follows the one provided as input
  cbn.bin      <- rep(0, n)
  cbn.bin[cbn] <- 1
  if (tail(cbn.bin, 1) == 0){
    ind <- tail(which(cbn.bin == 1), 1)
    cbn.bin[c(ind, ind+1)] <- c(0, 1)
  }else{
    ind <- 1 + tail(which(diff(cbn.bin) == -1), 1)
    nb  <- sum(cbn.bin[-c(1:ind)] == 1)
    cbn.bin[c(ind-1, (n-nb+1):n)] <- 0
    cbn.bin[ind:(ind+nb)]         <- 1
  }
  cbn <- which(cbn.bin == 1)
}

## Example parameters
n   <- 40
k   <- 10

## Iteration example
for (i in 1:choose(n, k)){
  if (i == 1){
    cbn <- 1:k
  }else{
    cbn <- gen.next.cbn(cbn, n)

  }
  print(cbn)


}


`

I get the error "cannot allocate vector of size n GB" when I go beyond 40 rows.

Ideal Solution: a) If the combinations can be dumped and memory can be flushed iteratively after every run in the loop (where I can check the further conditions) b) If the combinations can be dumped to a csv file such that it does not cause a memory hog.

Thanks for your support.

  • 2
    There are [219005316087032475](https://www.wolframalpha.com/input/?i=250+ncr+10) combinations of 10 out of 250 rows. Even if you could do one million combinations per second, that would take over 6900 years to run. Are you sure this is what you want to do? – Florian Jan 05 '18 at 08:43
  • If you really want to do this, you need to massively parallelize it. You also should not use R then but a compiled language. Do you have access to a really large cluster? – Roland Jan 05 '18 at 09:00
  • Thanks Florian, you are correct. I could even survive if I can get away with 5 out of 200 rows. This brings down complexity to 1e12 which I believe can be cracked in less than a week. – Kannan Subramanian Jan 05 '18 at 09:16
  • You should check out `iterpc`. This is exactly what it was built for. – Joseph Wood Jan 05 '18 at 14:33

1 Answers1

1

As I said in the comments, iterpc is the way to go for such a task. You first need to initialize an iterator via the iterpc function. Next we can generate the next n combinations via getnext. After this, we simply append our results to a csv (or any file type you like).

getComboChunks <- function(n, k, chunkSize, totalCombos, myFile) {
    myIter <- iterpc(n, k)

    ## initialized myFile
    myCombs <- getnext(myIter, chunkSize)
    write.table(myCombs, file = myFile, sep = ",", col.names = FALSE)

    maxIteration <- (totalCombos - chunkSize) %/% chunkSize

    for (i in 1:maxIteration) {
        ## get the next "chunkSize" of combinations
        myCombs <- getnext(myIter, chunkSize)

        ## append the above combinations to your file
        write.table(myCombs, file = myFile, sep = ",",
                    col.names = FALSE , append = TRUE)
    }
}

For example, getComboChunks(250, 10, 100, 1000, "myCombos.csv") will write out 1000 combinations of 250 choose 10 to the file myCombos.csv 100 combinations at a time. Doing this in chunks will be more efficient than one at a time.

This library is written in C/C++ so it should be fairly efficient, but as @Florian points out in the comments, it won't produce all gmp::chooseZ(250, 10) = Big Integer ('bigz') : [1] 219005316087032475 combinations any time soon. I haven't tested it, but if you settle for 200 choose 5, I think you will be able to produce it in under a day (it is just over 2.5 billion results).

Joseph Wood
  • 7,077
  • 2
  • 30
  • 65
  • Thanks Joseph Wood. This seems to be efficient and does the trick. I am trying it out now. – Kannan Subramanian Jan 05 '18 at 22:13
  • Is there a better way to check for conditions. The iterpc function has helped dump the values and I can sense its efficiency as you mentioned it has written in CC++ . – Kannan Subramanian Jan 07 '18 at 07:58
  • What conditions are you referring to.? Could you give an example? – Joseph Wood Jan 07 '18 at 13:34
  • Hi @Joseph I am trying something along the lines of the code presented here - [link](https://stackoverflow.com/questions/48135498/increase-speed-and-work-on-bigger-combinations-using-for-loop?noredirect=1#comment83248787_48135498) – Kannan Subramanian Jan 07 '18 at 21:48