I'm looking for a reasonable definition of a function weighted_sample
that does not return just one random index for a list of given weights (which would be something like
def weighted_choice(weights, random=random):
""" Given a list of weights [w_0, w_1, ..., w_n-1],
return an index i in range(n) with probability proportional to w_i. """
rnd = random.random() * sum(weights)
for i, w in enumerate(weights):
if w<0:
raise ValueError("Negative weight encountered.")
rnd -= w
if rnd < 0:
return i
raise ValueError("Sum of weights is not positive")
to give a categorical distribution with constant weights) but a random sample of k
of those, without replacement, just as random.sample
behaves compared to random.choice
.
Just as weighted_choice
can be written as
lambda weights: random.choice([val for val, cnt in enumerate(weights)
for i in range(cnt)])
weighted_sample
could be written as
lambda weights, k: random.sample([val for val, cnt in enumerate(weights)
for i in range(cnt)], k)
but I would like a solution that does not require me to unravel the weights into a (possibly huge) list.
Edit: If there are any nice algorithms that give me back a histogram/list of frequencies (in the same format as the argument weights
) instead of a sequence of indices, that would also be very useful.