-4

I had a custom deck consisting of eight cards of the sequence 2^n, n=0,..,6. I draw cards (without replacement) until the sum is equal or greater than the threshold. How can I implement in R a function that calculates the mean of the difference between the sum and the threshold??

I tried to do it using this How to store values in a vector with nested functions but it takes ages... I think there is a way to do it with probabilities/simulations but I can figure out.

The threshold could be greater than the value of one single card, ie, threshold=500 or less than the value of a single card, ie, threshold=50

What I have done so far is to find all the subsets that meet the condition of the sum greater or equal to the threshold. Then I will only substract the threshold and calculate the mean.

I am using the following code in R. For a small set I get the answer quite fast. However, I have been running the function for several ours with the set containing the 56 numbers and is still working.

 set<-c(rep(1,8),rep(2,8), rep(4,8),rep(8,8),rep(16,8),rep(32,8),rep(64,8))
recursive.subset <-function(x, index, current, threshold, result){
  for (i in index:length(x)){
    if (current + x[i] >= threshold){
      store <<- append(store, sum(c(result,x[i])))
    } else {
      recursive.subset(x, i + 1, current+x[i], threshold, c(result,x[i]))
    }
  }
}

store <- vector()
inivector <- vector(mode="numeric", length=0) #initializing empty vector    
recursive.subset (set, 1, 0, threshold, inivector)
Community
  • 1
  • 1
ph33
  • 3
  • 2
  • Hi. Welcome to Stack Overflow. When posting an [r] question it's often useful to provide a [minimal reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). When asking questions like this, share what you've tried, even if it didn't work, so we've got something to start with. – Phil Jan 31 '16 at 13:42
  • Hi, sorry I´m new on stackoverflow. I edited the question to show what I have done so far. Probably my approach is "brute force" but I can´t figure out how can I improve it. – ph33 Jan 31 '16 at 14:06
  • Are you solving the same problem as [this](http://stackoverflow.com/questions/34623156/r-from-a-vector-list-all-subsets-of-elements-so-their-sum-just-passes-a-value)? – Pierre L Jan 31 '16 at 14:19
  • Yes, this is the link that is posted in the question that I point here http://stackoverflow.com/questions/35107568/how-to-store-values-in-a-vector-with-nested-functions the code I posted here is the code you pointed Pierre with some modifications, however, for a big set (in my case 56) it has been running for several hours. For a small set the code works perfect but not for a big set. I think is because the permutations are around 56! / 23! – ph33 Jan 31 '16 at 14:28
  • @ph33 Does it suffice to do, say, 1 million random draws or does it have to be every possible combination? Because I wrote a script that can do 100k draws per second, but I'm not entirely sure if that is what you're looking for. It does allow you to estimate the mean difference. – slamballais Jan 31 '16 at 15:29
  • @Laterow simulations will be much appreciated, I think with simulations what is necessary to do is to take N samples of the deck and find the mean of the sum, am I right? However, is there any way to find the exact solutions? THANKS!!! – ph33 Jan 31 '16 at 15:50
  • @ph33 see answer below. – slamballais Jan 31 '16 at 16:09

1 Answers1

0

I don't know if it is possible to get an exact solution, simply because there are so many possible combinations. It is probably better to do simulations, i.e. write a script for 1 full draw and then rerun that script many times. Since the solutions are very similar, the simulation should give a pretty good approximation.

Ok, here goes:

set <- rep(2^(0:6), each = 8)
thr <- 500

fun <- function(set,thr){
    x <- cumsum(sample(set))
    value <- x[min(which(x >= thr))]
    value
}

system.time(a <- replicate(100000, fun(set,thr)))
# user  system elapsed 
# 1.10    0.00    1.09

mean(a - thr)
# [1] 21.22992

Explanation: Rather than drawing a card one at a time, I draw all cards simultaneously (sample) and then calculate the cumulative sum (cumsum). I then find the point where the cards at up to the threshold or larger, and find the sum of those cards back in x. We run this function many times with replicate, to obtain a vector of the values. We use mean(a-thr) to calculate the mean difference.

Edit: Made a really stupid typo in the code, fixed it now. Edit2: Shortened the function a little.

slamballais
  • 3,161
  • 3
  • 18
  • 29