I am trying to create a huge boolean
matrix which is randomly filled with True
and False
with a given probability p
. At first I used this code:
N = 30000
p = 0.1
np.random.choice(a=[False, True], size=(N, N), p=[p, 1-p])
But sadly it does not seem to terminate for this big N
. So I tried to split it up into the generation of the single rows by doing this:
N = 30000
p = 0.1
mask = np.empty((N, N))
for i in range (N):
mask[i] = np.random.choice(a=[False, True], size=N, p=[p, 1-p])
if (i % 100 == 0):
print(i)
Now, there happens something strange (at least on my device): The first ~1100 rows are very fastly generated - but after it, the code becomes horribly slow. Why is this happening? What do I miss here? Are there better ways to create a big matrix which has True
entries with probability p
and False
entries with probability 1-p
?
Edit: As many of you assumed that the RAM will be a problem: As the device which will run the code has almost 500GB RAM, this won't be a problem.