2

Let's assume that we got a list like the one below:

list = [[A,10,3],[B,5,2],[C,8,1]]

For each item in the list there is a probability to be chosen which could be calculated by softmax. for example for first element (A) we have:

from math import exp

A_probability = exp(list[0][2]/list[0][1] /
                     (exp(list[0][2]/list[0][1]) +
                      exp(list[1][2]/list[1][1]) +
                      exp(list[2][2]/list[2][1])))

How can I choose the items in the list randomly according to the calculated probablity for each one?

Stefan Zobel
  • 3,182
  • 7
  • 28
  • 38
Masoud Masoumi Moghadam
  • 1,094
  • 3
  • 23
  • 45

2 Answers2

4

I'm assuming that you have a pre-calculated list of probabilities (say probs) for each of the indexes in the list (say data) you want to choose from.

Further, probs and data obviously must have the same length and the entries of probs must be non-negative numbers summing to 1.

There is a neat but simple technique to randomly choose the indexes in data according to the distribution in probs that is known as the roulette wheel. In Python, I believe, it should look somehow like this

import random

data = ['A', 'B', 'C', 'D']

probs = [0.2, 0.4, 0.3, 0.1]

def roulette_wheel(probs):
    rand = random.random()
    for slot, prob in enumerate(probs):
        rand -= prob
        if rand < 0.0:
            return slot

Note that this can be generalized to a list of non-negative weights (that doesn't have to add up to 1) by multiplying rand by the term sum(weights). I believe, I first saw this cute idea in a book about Pascal programming some eons ago.

Edit:

As MadPhysicist suggested in a comment this can be made a lot more efficient if one needs to draw repeatedly from the same data. In that case, one can pre-compute the cumulative distribution function and then just do a binary search for the index such that cumulative prob. <= rand ~ U(0, 1). In Python, this could for example look somehow like the following

from random import random
from bisect import bisect_right


def cdf(probs):
    cdf = []
    total = 0.0
    for p in probs:
        total += p
        cdf.append(total)
    return cdf


def roulette_wheel_bisect(cdf):
    return bisect_right(cdf, random())

# compute cdf
cumsum = cdf(probs)

# randomly draw 10 indexes 
for i in range(0, 10):
    print(roulette_wheel_bisect(cumsum))

Disclaimer: I'm not a Python programmer by trade, so the code above should only illustrate the general idea. It may not be very robust for practical uses. You should always use a well-tested standard library, numpy for example, if you can.

Edit2:

I've just learned that numpy has numpy.random.choice which does exactly what you need. Example:

from numpy import random

data = ['A', 'B', 'C', 'D']
probs = [0.2, 0.4, 0.3, 0.1]

# randomly draw 10 list elements with replacement
for i in range(0, 10):
    print(random.choice(data, p=probs))
Community
  • 1
  • 1
Stefan Zobel
  • 3,182
  • 7
  • 28
  • 38
0

I know this is old but I wanted to do something similar recently and found this question. I concur with Stefan's last answer under the Edit2 using np.random.choice. In my testing np.random.choice became the most efficient way to do this as the sampling got larger (over 1000 samples). It also becomes more efficient as the samples increase to use a numpy array for the probability than using an auxiliary list containing 1000s of observations to select from (I assume because of the smaller storage). I would take advantage of the size parameter of np.random.choice then assign the sampling to a new array.

from numpy import random

data = ['A', 'B', 'C', 'D']
probs = [0.2, 0.4, 0.3, 0.1]
number_of_selections = 100000

# randomly draw 100000 list elements with replacement
indices = np.random.choice(len(data), size=number_of_selections, p=probs)

# assign to an array
sampling = data[indices]
print(sampling)
Dan
  • 758
  • 6
  • 20