0

I am working on a project doing some Bayseian Statistics in Python and am using the Numpy Random Binomial function. However while putting in the keyword arguments/parameters I am a bit confused with some of the basic theory behind this. My set up is the following:

trial = np.random.binomial(n, .10, 1)

(Where n = 1000)

Problem: Say you flip a biased coin (p = .10).

Question: Is there a difference between 1000 tosses with .10 probability done once or 1 toss with .10 probably done 1000 times? Which is preferable? Is one more computationally efficient than the other?

ie. what is, if there is one, the difference between:

np.random.binomial(1000, .10, 1)

and

np.random.binomial(1, .10, 1000)

Or, phrased a different way, what is the difference between the parameters of a distribution and the shape?

I have read the Numpy Binomial Docs found here

If someone could explain the theory or basic intuition behind this that would be really helpful!

2 Answers2

0

Considering this example

In [51]: for i in range(10): print (np.random.binomial(10, 0.3))
2
3
5
2
4
2
5
3
1
2

It is equvilant to say: if p=0.3, and n=10; meaning the biased coin is baised (0.3 of chance of getting head) and the number of trials is 10, how many times out of 10 I will get head? Of course it is not going to be exactly 3 all the time, (and in fact you may even 10 heads out of 10 tosses, albeit at very small probability), as it is demonstrated here by running it in a loop for 10 times.

However in numpy we often will not run thing in a loop. If we want to generate 10 such numbers, we can supply an additional size parameter to the call

In [53]: np.random.binomial(10, 0.3, size=10)
Out[53]: array([3, 5, 4, 1, 2, 3, 2, 3, 3, 4])
CT Zhu
  • 52,648
  • 17
  • 120
  • 133
  • In `np.random.binomial(n=1, p=0.10, size=1000)`, yes it is an array of 1000 elements of 0 and 1, but this is due to `n=1`. Since there is only 1 toss, it would has to either 1 or 0. In `np.random.binomial(n=1000, p=.10, size=1)`, the result is an array of 1 element: try 1000 timer and the count of the times of successes. – CT Zhu Sep 09 '18 at 16:58
  • Thank you, this makes a lot more sense now. –  Sep 09 '18 at 19:22
0

When the outcome of a single random trial is a "success" with probability p, we call that a Bernoulli trial with parameter p, a.k.a. a Bernoulli(p). The binomial distribution evaluates the probability of getting a particular number of "successes" if we run n independent Bernoulli trials with the same p. We call that a binomial distribution with parameters n and p, a.k.a. a binomial(n, p). By convention, Bernoulli successes are encoded as a 1, and failures as a 0. Consequently, a Bernoulli(p) and a binomial(1, p) are the same thing.

If we're generating values from a binomial(n, p) the outcome is a count of how many successes were observed in a realization of n trials. The number of successes is thus the sum of n Bernoulli results.

Now to NumPy. Invoking np.random.binomial(1000, .10, 1) generates one realization of a binomial with n = 1000 and p = 0.1. Using np.random.binomial(1, .10, 1000) generates 1000 realizations of a binomial with n = 1 and p = 0.1, i.e., 1000 Bernoulli(0.1)'s. These are not the same thing, but if you sum the Bernoulli values using sum(np.random.binomial(1, .10, 1000)) the result will have the same distribution as np.random.binomial(1000, .10, 1).

Is one of them better than the other? While they are interchangeable from a distributional perspective, there are more efficient computational methods to generate binomials than just generating n Bernoullis and summing them. Assuming the implementers of NumPy are competent, you're probably better off using np.random.binomial(1000, .10, 1).

pjs
  • 18,696
  • 4
  • 27
  • 56