I want to speed up the code below - namely the for loop. Is there a way to do it in numpy?
import numpy as np
# define seend and random state
rng = np.random.default_rng(0)
# num of throws
N = 10**1
# max number of trials
total_n_trials = 10
# generate the throws' distributions of "performace" - overall scores
# mu_throws = rng.normal(0.7, 0.1, N)
mu_throws = np.linspace(0,1,N)
all_trials = np.zeros(N*total_n_trials)
for i in range(N):
# generate trials per throw as Bernoulli trials with given mean
all_trials[i*total_n_trials:(i+1)*total_n_trials] = rng.choice([0,1], size=total_n_trials, p=[1-mu_throws[i],mu_throws[i]])
More explanations - I want to generate N
sequences of Bernoulli trials (ie. 0s and 1s, called throws
) where each sequence has a mean (probability p
) given by values in another array (mu_throws
). This can be sampled from a normal distribution or in this case, I took it for simplicity to be a sequence of N=10
numbers from 0 to 1. The approach above works but is slow. I expect N
to be at least $10^4$ and total_n_trials
then can be anything in the order of hundreds to (tens of) thousands (and this is run several times). I have checked the following post but didn't find an answer. I also know that numpy random choice
can generate multidimensional arrays but I didn't find a way to set a different set of p
s for different rows. Basically getting the same as what I do above just reshaped:
all_trials.reshape(N,-1)
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 1., 1., 0., 0.],
[1., 0., 0., 1., 0., 0., 0., 1., 1., 0.],
[1., 0., 1., 0., 0., 1., 0., 1., 0., 1.],
[1., 0., 1., 0., 0., 0., 1., 1., 0., 0.],
[1., 0., 0., 1., 0., 1., 0., 1., 1., 0.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])
I also thought about this trick but haven't figured how to use it for Bernoulli trials. Thanks for any tips.