0

I want to simulate repeated flips of a fair coin using Python. Two approaches seem reasonable, as shown in the code below. Are they equivalent please, and if so is there any good reason to prefer one over the other?

import random
import numpy as np

n = 10
p = 0.5

print(np.random.binomial(n, p))

results = random.choices([True, False], k=n)
print(results.count(True))
Robin Andrews
  • 3,514
  • 11
  • 43
  • 111

2 Answers2

3

In effect they are equivalent.

  • random.choices selects either True or False with replacement, then counts the number of times True was chosen this way.
  • binomial returns the sum of 10 Bernoulli trials with the given p parameter. Since p = 0.5 here, this is equivalent to choices.

The random.choices approach has the advantage that no external libraries are required, since the random module is part of the Python standard library and is thus available by default in Python. On the other hand, a binomial distribution lets you simulate the number of heads from flipping biased coins, not just fair coins.

However, numpy.random.* functions, such as numpy.random.binomial, have become legacy functions as of NumPy 1.17, and their algorithms are expected to remain as they are for backward compatibility reasons. That version didn't deprecate any numpy.random.* functions, however, so they are still available for the time being. See also this question. If you use NumPy 1.17 or later, you should make use of the new system introduced in version 1.17, including numpy.random.Generator, in newer applications. The following example uses Generator's binomial method:gen=np.random.default_rng(); print(gen.binomial(10, 0.5)). See also: Are numpy binomial random numbers inefficient?

There is a third equivalent which is offered in NumPy: the choice method. Note that again we use NumPy's Generator rather than the legacy pseudorandom system.

gen=np.random.default_rng();
print(np.sum(gen.choice([0,1],size=10)))
Peter O.
  • 32,158
  • 14
  • 82
  • 96
0

Further to what Peter said: for your parameters there's no appreciable difference between your alternatives, they both just take a microsecond or two to execute.

If you increase n a bit, e.g. to 1000, then the numpy version will start to be significantly faster. This is because it doesn't need to build a list and then count elements in that list, it knows how to calculate the exact value.

If you increase n to a few billion, then you have to use the numpy version as you'll not have enough memory to store the intermediate list produced by choices.

Asymptotically, choices scales as O(n), while Numpy's binomial is O(1).

Sam Mason
  • 15,216
  • 1
  • 41
  • 60