3

I want to speed up the code below - namely the for loop. Is there a way to do it in numpy?

import numpy as np
# define seend and random state
rng = np.random.default_rng(0)
# num of throws
N = 10**1
# max number of trials
total_n_trials = 10
# generate the throws' distributions of "performace" - overall scores
# mu_throws = rng.normal(0.7, 0.1, N)
mu_throws = np.linspace(0,1,N)

all_trials = np.zeros(N*total_n_trials)
for i in range(N):
    # generate trials per throw as Bernoulli trials with given mean
    all_trials[i*total_n_trials:(i+1)*total_n_trials] = rng.choice([0,1], size=total_n_trials, p=[1-mu_throws[i],mu_throws[i]])
    

More explanations - I want to generate N sequences of Bernoulli trials (ie. 0s and 1s, called throws) where each sequence has a mean (probability p) given by values in another array (mu_throws). This can be sampled from a normal distribution or in this case, I took it for simplicity to be a sequence of N=10 numbers from 0 to 1. The approach above works but is slow. I expect N to be at least $10^4$ and total_n_trials then can be anything in the order of hundreds to (tens of) thousands (and this is run several times). I have checked the following post but didn't find an answer. I also know that numpy random choice can generate multidimensional arrays but I didn't find a way to set a different set of ps for different rows. Basically getting the same as what I do above just reshaped:

all_trials.reshape(N,-1)
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 1., 1., 0., 0.],
       [1., 0., 0., 1., 0., 0., 0., 1., 1., 0.],
       [1., 0., 1., 0., 0., 1., 0., 1., 0., 1.],
       [1., 0., 1., 0., 0., 0., 1., 1., 0., 0.],
       [1., 0., 0., 1., 0., 1., 0., 1., 1., 0.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])

I also thought about this trick but haven't figured how to use it for Bernoulli trials. Thanks for any tips.

norok2
  • 25,683
  • 4
  • 73
  • 99
My Work
  • 2,143
  • 2
  • 19
  • 47

3 Answers3

3

While np.random.Generator.binomial() (or the legacy np.random.binomial()) is probably the cleanest way of computing what you are after (as suggested in @MechanicPig's answer), it looks like it is not the fastest.

Actually, with the input sizes you indicated to be relevant for you, OP's approach can even be faster.

Instead, an approach based on thresholding a uniformly distributed random array (essentially a polished version of @SalvatoreDanieleBianco's answer) seems to be fastest.

Also, the newer random generators seem to be faster.

Eventually, the thresholding approach can be implemented in Numba, but this does not result in significant speed-up.

Here are some equivalent approaches:

import numpy as np


def bin_samples_OP(n, mu, seed=0):
    m = len(mu)
    rng = np.random.default_rng(seed)
    result = np.zeros((m, n), dtype=np.uint8)
    for i in range(m):
        result[i, :] = rng.choice([0, 1], size=n, p=[1 - mu[i], mu[i]])
    return result
import numpy as np


def bin_samples_binom(n, mu, seed=0):
    np.random.seed(seed)
    return np.random.binomial(1, mu[:, None], (len(mu), n))
import numpy as np


def bin_samples_binom2(n, mu, seed=0):
    rng = np.random.default_rng(seed)
    return rng.binomial(1, mu[:, None], (len(mu), n))
import numpy as np


def bin_samples_rand(n, mu, seed=0):
    np.random.seed(seed)
    return (np.random.random_sample((len(mu), n)) < mu[:, None]).astype(np.uint8)
import numpy as np


def bin_samples_rand2(n, mu, seed=0):
    rng = np.random.default_rng(seed)
    return (rng.random(size=(len(mu), n)) < mu[:, None]).astype(np.uint8)
import numpy as np


def bin_samples_rand3(n, mu, seed=0):
    rng = np.random.default_rng(seed)
    return (rng.random(size=(len(mu), n)).T < mu).astype(np.uint8).T
import numpy as np
import numba as nb
import random


@nb.njit
def bin_samples_nb(n, mu, seed=0):
    m = len(mu)
    random.seed(seed)
    result = np.empty((m, n), dtype=np.uint8)
    for i in range(m):
        for j in range(n):
            result[i, j] = random.random() < mu[i]
    return result

Following are a couple of test to see how they are equivalent:

funcs = (
    bin_samples_OP, bin_samples_binom, bin_samples_binom2,
    bin_samples_rand, bin_samples_rand2, bin_samples_rand3,
    bin_samples_nb)


n = 10
m = 5
mu = np.linspace(0, 1, m)

print(f"{'Target':>24}  {mu}")
for i, func in enumerate(funcs, 1):
    res = func(n, mu)
    mu_exp = np.mean(res, -1)
    is_good = np.isclose(mu, mu_exp, atol=0.1, rtol=0.1).astype(np.uint8)
    print(f"{func.__name__:>24}  {mu_exp}  {is_good}")
                  Target  [0.   0.25 0.5  0.75 1.  ]
          bin_samples_OP  [0.   0.31 0.53 0.73 1.  ]  [1 1 1 1 1]
       bin_samples_binom  [0.   0.22 0.5  0.77 1.  ]  [1 1 1 1 1]
      bin_samples_binom2  [0.   0.31 0.56 0.68 1.  ]  [1 1 1 1 1]
        bin_samples_rand  [0.   0.22 0.5  0.77 1.  ]  [1 1 1 1 1]
       bin_samples_rand2  [0.   0.22 0.47 0.77 1.  ]  [1 1 1 1 1]
       bin_samples_rand3  [0.   0.22 0.47 0.77 1.  ]  [1 1 1 1 1]
          bin_samples_nb  [0.   0.35 0.53 0.78 1.  ]  [1 1 1 1 1]
         bin_samples_pnb  [0.   0.22 0.5  0.78 1.  ]  [1 1 1 1 1]

(bin_samples_pnb() is the same as bin_sampls_nb() but with @nb.njit(parallel=True) and range() replaced by nb.prange())

and their respective timings for input sizes closer to what you indicated:

timings = {}
for p in range(12, 14):
    for q in range(12, 15):
        n = 2 ** p
        m = 2 ** q
        mu = np.linspace(0, 1, m)
        print(f"n={n}, m={m};  (p={p}, q={q})")
        timings[n, m] = []
        for func in funcs:
            res = func(n, mu)
            mu_exp = np.mean(res, axis=-1)
            
            is_good = np.allclose(mu, mu_exp, atol=0.1, rtol=0.1)
            timed = %timeit -r 1 -n 1 -o -q func(n, mu)
            timings[n, m].append(timed.best)
            print(f"{func.__name__:>24}  {is_good}  {float(timed.best * 10 ** 3):>10.4f} ms")
n=4096, m=4096;  (p=12, q=12)
          bin_samples_OP  True    533.5690 ms
       bin_samples_binom  True    517.4861 ms
      bin_samples_binom2  True    487.3975 ms
        bin_samples_rand  True    202.7566 ms
       bin_samples_rand2  True    135.5819 ms
       bin_samples_rand3  True    139.0170 ms
          bin_samples_nb  True    110.7469 ms
         bin_samples_pnb  True    141.4636 ms
n=4096, m=8192;  (p=12, q=13)
          bin_samples_OP  True   1031.1586 ms
       bin_samples_binom  True   1035.9100 ms
      bin_samples_binom2  True    957.9257 ms
        bin_samples_rand  True    392.7915 ms
       bin_samples_rand2  True    246.8801 ms
       bin_samples_rand3  True    250.9648 ms
          bin_samples_nb  True    222.7287 ms
         bin_samples_pnb  True    270.5297 ms
n=4096, m=16384;  (p=12, q=14)
          bin_samples_OP  True   2055.6572 ms
       bin_samples_binom  True   2036.8609 ms
      bin_samples_binom2  True   1865.3614 ms
        bin_samples_rand  True    744.9770 ms
       bin_samples_rand2  True    493.8508 ms
       bin_samples_rand3  True    476.5701 ms
          bin_samples_nb  True    449.1700 ms
         bin_samples_pnb  True    523.8716 ms
n=8192, m=4096;  (p=13, q=12)
          bin_samples_OP  True    886.8198 ms
       bin_samples_binom  True   1021.3093 ms
      bin_samples_binom2  True    946.2922 ms
        bin_samples_rand  True    372.9902 ms
       bin_samples_rand2  True    221.8423 ms
       bin_samples_rand3  True    234.7383 ms
          bin_samples_nb  True    218.5655 ms
         bin_samples_pnb  True    276.3884 ms
n=8192, m=8192;  (p=13, q=13)
          bin_samples_OP  True   1744.0382 ms
       bin_samples_binom  True   2101.7884 ms
      bin_samples_binom2  True   1985.6555 ms
        bin_samples_rand  True    720.7352 ms
       bin_samples_rand2  True    462.9820 ms
       bin_samples_rand3  True    453.3408 ms
          bin_samples_nb  True    455.7062 ms
         bin_samples_pnb  True    541.0242 ms
n=8192, m=16384;  (p=13, q=14)
          bin_samples_OP  True   3490.9539 ms
       bin_samples_binom  True   4106.5607 ms
      bin_samples_binom2  True   3719.5692 ms
        bin_samples_rand  True   1363.6868 ms
       bin_samples_rand2  True    884.4729 ms
       bin_samples_rand3  True    868.3783 ms
          bin_samples_nb  True    888.8390 ms
         bin_samples_pnb  True   1030.0389 ms

These indicate bin_samples_rand2() / bin_samples_rand3() (which perform essentially equivalently) and bin_samples_nb() to be in the same ballpark.

Forcing parallelism in Numba is not going to help with speed (because LLVM can produce more optimized speed on its own rather than relying on Numba's parallelization model) and it will also make loose predictability.

norok2
  • 25,683
  • 4
  • 73
  • 99
  • Hi norok2 and thanks for the summary. Just a note, the broadcasting, the polished version of @SalvatoreDanieleBianco seems to be slower than the original with transpositions, at least based on my measures. I don't really get why thou, any thoughts? Have you tested that as well? – My Work Jun 03 '22 at 16:27
  • Hi MyWork, I updated my answer. I see no indication of either being significantly different from the other. Afterall, there is only one view more in @SalvatoreDanieleBianco's approach, which is pretty fast (few ns) and the precision of the computation timing is dominated by other factors. – norok2 Jun 03 '22 at 16:43
  • I know where the difference was coming from, sorry, my bad. You are initialising the random state with every call of the function and that caused the difference... – My Work Jun 03 '22 at 16:47
  • 2
    Thanks a lot for your answer, it helped me a lot. I eventually decided to accept @Salvatore Daniele Bianco's answer because I felt like it has the original idea which you then purified and tested, so I wanted to credit that. It was a hard choice... I upvoted your answer as well, thanks again. – My Work Jun 05 '22 at 08:37
  • I upvoted that answer myself ;-) – norok2 Jun 05 '22 at 11:38
2
N = 11
mu_throws = np.linspace(0,1,N)
total_n_trials = 10_000

rng = np.random.default_rng(0)
all_trials = (rng.random((N, total_n_trials)).T<mu_throws).T.astype(int)
all_trials # shape (N, total_n_trials)

output:

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [1, 0, 0, ..., 0, 0, 0],
       ...,
       [1, 0, 1, ..., 1, 1, 1],
       [1, 1, 1, ..., 0, 1, 1],
       [1, 1, 1, ..., 1, 1, 1]])

Basically, what I'm doing is generate random real numbers in the interval [0, 1) and convert them in boolean results in function of a given probability (in mu_throws).

if you compare the mu_throws (actual probability mean) and the probability estimated in all_trials you have:

np.c_[mu_throws, all_trials.mean(1)]

output:

array([[0.    , 0.    ],
       [0.1   , 0.1003],
       [0.2   , 0.1963],
       [0.3   , 0.305 ],
       [0.4   , 0.4006],
       [0.5   , 0.5056],
       [0.6   , 0.5992],
       [0.7   , 0.6962],
       [0.8   , 0.7906],
       [0.9   , 0.8953],
       [1.    , 1.    ]])

For N and total_n_trials values from your example the time required on my machine is 0.00014019012451171875 sec, vs 0.0012738704681396484 sec of your loop.

  • 1
    Thanks, just to be consistent, I'd use the recommended random state: `(rng.random((N, total_n_trials)).T – My Work Jun 03 '22 at 13:11
  • @MyWork ok! It's even better. – Salvatore Daniele Bianco Jun 03 '22 at 13:31
  • 1
    Instead of transposing the random array, you could make `mu_throws` broadcastable via either `reshape()` method or a slicing with a `np.newaxis()` / `None` axis, and this which should be faster. – norok2 Jun 03 '22 at 14:13
  • @norok2 Yes, you are right. Indeed I also tried that strategy (reshaping the `mu_throws` vector), but I noticed that (empirically) it was a little slower on: `(rng.random ((N, total_n_trials)) – Salvatore Daniele Bianco Jun 03 '22 at 14:18
2

Use numpy.random.binomial:

>>> n = 10
>>> total_n_trials = 10
>>> mu_throws = np.linspace(0, 1, n)
>>> np.random.binomial(1, mu_throws[:, None], (n, total_n_trials))
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 1, 1, 0],
       [1, 0, 0, 0, 1, 1, 0, 0, 0, 0],
       [1, 0, 0, 0, 0, 1, 1, 0, 0, 1],
       [0, 1, 1, 1, 1, 0, 1, 1, 1, 0],
       [1, 0, 1, 1, 1, 1, 1, 0, 1, 0],
       [1, 1, 1, 0, 1, 0, 1, 0, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])

Numpy now recommends the new random generator. You can consider using numpy.random.Generator.binomial:

>>> rng = np.random.default_rng()
>>> rng.binomial(1, mu_throws[:, None], (n, total_n_trials))
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 1, 0, 1, 0],
       [0, 0, 0, 0, 1, 0, 1, 0, 1, 0],
       [1, 0, 0, 1, 1, 0, 1, 1, 1, 0],
       [0, 1, 0, 1, 1, 1, 1, 1, 1, 1],
       [0, 0, 1, 1, 1, 1, 1, 0, 1, 0],
       [1, 1, 1, 1, 1, 0, 0, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]], dtype=int64)
Mechanic Pig
  • 6,756
  • 3
  • 10
  • 31
  • Although it requires `0.0002377033233642578` sec on my machine (that is the double of my solution time), I think it is the most clean solution. – Salvatore Daniele Bianco Jun 03 '22 at 09:56
  • On my machine still the same time, but I think It depends on the hardware. – Salvatore Daniele Bianco Jun 03 '22 at 10:19
  • @SalvatoreDanieleBianco Sorry, my measurement is wrong. I didn't generate an array of the same size as you. – Mechanic Pig Jun 03 '22 at 10:22
  • I believe the `10` should be `total_n_trials`, I cannot propose an edit – My Work Jun 03 '22 at 13:07
  • @MyWork I modified the answer. I'm sorry I didn't write this variable, because I was writing the answer on my mobile phone at that time. It was inconvenient to type. – Mechanic Pig Jun 03 '22 at 13:12
  • No problem, thanks for your answer and edit :), I will wait a bit longer if maybe other answers will appear – My Work Jun 03 '22 at 13:15
  • Thanks a lot for your answer, it was a direction I was looking at but couldn't see. I eventually decided to accept @Salvatore Daniele Bianco's answer because his approach was closer to my goal (speed), so I wanted to credit that. But yours followed the original spirit and it was a hard choice... I upvoted your answer as well, thanks again. – My Work Jun 05 '22 at 08:39