3

Is there a way to ensure rbinom() returns at least one success?

I am running rbinom() with a really low probability:

rbinom(5015, size=2, prob= 1/5000))

Since it is a probability distribution, there is a chance all 0s are returned. Is there a way to ensure that at there is at least one success returned?

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
Sky Scraper
  • 148
  • 1
  • 10

3 Answers3

4

A simple while loop can be written to keep iterating until there is at least one success in the vector (here, defined as x):

x <- 0
while(sum(x) == 0) {
  x <- rbinom(5015, size=2, prob= 1/5000)
}

# check

table(x)
#x
#   0    1 
#5013    2 # here there were two
jpsmith
  • 11,023
  • 5
  • 15
  • 36
  • selecting this solution. It might not have as quick (as @Ben BOlker's solution) or have as much QC-ability (@Lil' Pete's solution), it is simple and works for my application. – Sky Scraper Apr 24 '23 at 22:52
  • Thanks @SkyScraper! Happy to add in more bells and whistles if you'd like, but I saw no need for them and given some loops may run a while, it may cost computational efficiency. But happy to edit if you want some additional QC – jpsmith Apr 24 '23 at 23:35
  • not necessary, not required for my application where I just want to avoid a distribution of all zeros being output (even thought this is a valid distribution). thank you! @jpsmith – Sky Scraper Apr 25 '23 at 16:29
4

The short answer is no, but (depending on your application) you could continue to generate binomial outcomes until you have a success.

out = rbinom(5015, size=2, prob= 1/5000)
resp = any(out>0)
count = 1
while (!resp) {
    count = count + 1
    out = c(out, rbinom(5015, size = 2, prob = 1/5000))
    resp = any(out>0)
}
cat("\nNumber of iterations =",count)
Lil' Pete
  • 155
  • 1
  • 8
1

An alternative to the while loop suggestion: if the values are all zero, pick one at random and set it to 1 instead.

n <- 5015
r <- rbinom(n, size=2, prob= 1/5000))
if (all(r == 0)) {
   r[sample(n, size = 1)] <- 1
}

This is sort of hacky, but potentially more efficient than repeated simulation (e.g., if you used the while suggestion with n = 1000 and prob = 1e-10 you could be waiting a long time ...). Since it's hard to think of a principled way to modify the statistical distribution to enforce this (since it effectively requires defining a multivariate distribution), this approach might be as good as anything else.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • I dunno, I'm not really on board with that since it changes the distribution in an unknown way. (Maybe it's exactly right, but we would have to establish that.) The rejection sampling approach mention by others is at least known to be correct, and whether it's efficient enough depends on the specifics of OP's situation. If OP comes back and says rejection sampling is too inefficient then we'll have to take another look. – Robert Dodier Apr 23 '23 at 18:48
  • I would argue that it's not too easy to characterize the distribution "binomial draw conditional on no zeros in the entire data set" either ... (i.e., the result of the conditional sampling is *no longer* a simple draw of N iid binomial values, since it depends on the full multivariate outcome ...) – Ben Bolker Apr 23 '23 at 20:34
  • Whatever that distribution might be, rejection sampling is a correct way to get a sample from it. – Robert Dodier Apr 23 '23 at 20:37
  • 1
    I like this shortcut and works for my application, where an all 0 output results in NA data so I need at least one 1, but I could see how it would not be appropriate for all applications. While I may use this solution, I will not choose it as the "correct" answer so a future viewer whose problem cannot be solved by this approach does not use this solution. – Sky Scraper Apr 24 '23 at 22:50
  • @SkyScraper I thought about it some more, and the method suggested here is incorrect, by more than a little. Working from the problem statement, the number of 1's has a binomial distribution with n = 5015 and p = 1/5000, so P(0) = (1 - p)^n approx= 1/e, P(1) = 5015 . p . (1 - p)^(n - 1) approx= 1/e as well, so P(2 or more) approx= 1 - 2/e = 0.26. The correct P(1 given at least 1) is P(1)/(P(1) + P(2 or more)) = 0.58, but the method stated here gives P(1 given at least 1) as P(0) + P(1) = 0.73. I wouldn't say that's an acceptable approximation, unless the goal is just to get some 0's and 1's. – Robert Dodier Apr 24 '23 at 23:18
  • Hmm. Is your `P(0)` "probability of zero values of `x=1` in a full sample"? Because by my calculation (`n <- 5015; p <- 1/5000; exp(n*dbinom(0,prob = p, size =2, log = TRUE))`) `P(0)` is 0.13 (you seem to be assuming `size=1` here?). I'm not clear on all of your logic; maybe this would be a good CrossValidated question? – Ben Bolker Apr 24 '23 at 23:36
  • Yes, by P(k) I mean the probability of k 1's. You're right, I mishandled the `size` argument. The point stands, though; P(1) is inflated relative to P(k) for k > 1. A direct Monte Carlo estimate yields P(0) through P(9) as approximately 0.1277, 0.2728, 0.2742, 0.1835, 0.0905, 0.0331, 0.0137, 0.0034, 0.0010, and 0.0001, so P(1) should be 0.2728/(1 - 0.1277) = 0.31, but replacing all 0's with one 1 yields P(1) = 0.1277 + 0.2728 = 0.40. – Robert Dodier Apr 25 '23 at 00:22