0

What I am doing at the moment:

import numpy as np

d = 60000
n_choice = 1000
n_samples = 5000

# some probability vector
p = np.random.randint(d)
p = p/np.sum(p)

rng = np.default_rng(123)
samples = np.empty((n_choice, n_samples))

for i in range(n_samples):
    samples[:, i] = rng.choice(d, size=n_choice, replace=False, p=p, shuffle=False)

This is a bit slow for my taste. Is there a way to speed this up? E.g., by replacing the loop with a trick or using some other form of simulation?

I skimmed through similar questions on stack, but only found this where the weights are uniform and d=n_choice and this where weights are given but only the rows (columns) of the samples array have to be unique.

Why
  • 1
  • 1

0 Answers0