0

I have been sampling random numbers from different distributions and just realized how slow are numpy binomial random numbers compared to other distributions. For instance

%timeit for x in range(100): np.random.binomial(100,0.5)
10000 loops, best of 3: 82.6 µs per loop
%timeit for x in range(100): np.random.uniform()
100000 loops, best of 3: 14.6 µs per loop

A binomial number takes 6 times more than a uniform one! This can be understandable, since binomial is discrete and requires a more complex transformation. But for instance if I ask for a binomial with a number of trials n=0 or n=1 the time spent is similar:

%timeit for x in range(100): np.random.binomial(0,0.5)
10000 loops, best of 3: 78.8 µs per loop

%timeit for x in range(100): np.random.binomial(1,0.5)
10000 loops, best of 3: 80.1 µs per loop

This does not seem very efficient because the result of these samplings should be trivial: For zero trials, the results should be always zero and for 1 trial it should be a simple Bernoulli trial. So for instance a faster implementation of the binomial would be:

import numpy as np

def custombinomial(n,p):
    if n == 0:
        return 0
    if n == 1:
        x = np.random.uniform()
        if x<p:
            return 1
        else:
            return 0  
    else:
        return np.random.binomial()

And here is the timing:

%timeit for x in range(100): custombinomial(0,0.5)
100000 loops, best of 3: 11.8 µs per loop

 %timeit for x in range(100): custombinomial(1,0.5)
10000 loops, best of 3: 31.2 µs per loop

I am sure this could be improved for even larger values of n. Is there any reason I am missing for numpy being so slow? Is there any other library that can give faster random numbers (even if it includes some sort of C/Cython)?

Also, additionally, I know that numpy is good if I want to create a bunch of random numbers at the same time i.e. get an array of binomially distributed numbers, but in many cases the parameters of the distribution n and p will change on the fly, so the call of individual random numbers would not be directly an option. Would it be possible an alternative in which an array of uniformly distributed random numbers is generated and they are transformed in the particular binomials as they are required?.

Ruben Perez-Carrasco
  • 1,693
  • 2
  • 11
  • 9

1 Answers1

3

Numpy's legacy binomial random generator is implemented in C, and the algorithm uses numerical inversion if the parameters are sufficiently small. This may be too much work if p = 0.5, since random bits rather than random doubles could have been used instead in the binomial generator. In addition, the basic algorithm hasn't changed, it seems, for years (see also mtrand.pyx), so that it doesn't take advantage of vectorization or multithreading, for example.

Moreover, in the early days of Numpy there wasn't "much cause to change the distribution methods that much", so that this and other random generation algorithms in Numpy were retained in the name of reproducible "randomness". However, this has changed in version 1.17 and later: Breaking changes to random generation methods, such as a new binomial generator, are now allowed, but are treated as new features that will be introduced only "on X.Y releases, never X.Y.Z". For details, see "RNG Policy" and "Random Sampling (numpy.random)".

If having faster binomial random variates matters to you, you should file a new Numpy issue.

EDIT (Nov. 9): Code for legacy distributions was moved.

Peter O.
  • 32,158
  • 14
  • 82
  • 96