How to choose an item in a list according to a specific probability?

Question

Let's assume that we got a list like the one below:

list = [[A,10,3],[B,5,2],[C,8,1]]

For each item in the list there is a probability to be chosen which could be calculated by softmax. for example for first element (A) we have:

from math import exp

A_probability = exp(list[0][2]/list[0][1] /
                     (exp(list[0][2]/list[0][1]) +
                      exp(list[1][2]/list[1][1]) +
                      exp(list[2][2]/list[2][1])))

How can I choose the items in the list randomly according to the calculated probablity for each one?

possible duplicate of [Pythonic way to select list elements with different probability](http://stackoverflow.com/questions/4113307/pythonic-way-to-select-list-elements-with-different-probability) — chickity china chinese chicken, Apr 20 '17 at 21:53

score 4 · Accepted Answer · edited May 23 '17 at 11:54

I'm assuming that you have a pre-calculated list of probabilities (say probs) for each of the indexes in the list (say data) you want to choose from.

Further, probs and data obviously must have the same length and the entries of probs must be non-negative numbers summing to 1.

There is a neat but simple technique to randomly choose the indexes in data according to the distribution in probs that is known as the roulette wheel. In Python, I believe, it should look somehow like this

import random

data = ['A', 'B', 'C', 'D']

probs = [0.2, 0.4, 0.3, 0.1]

def roulette_wheel(probs):
    rand = random.random()
    for slot, prob in enumerate(probs):
        rand -= prob
        if rand < 0.0:
            return slot

Note that this can be generalized to a list of non-negative weights (that doesn't have to add up to 1) by multiplying rand by the term sum(weights). I believe, I first saw this cute idea in a book about Pascal programming some eons ago.

Edit:

As MadPhysicist suggested in a comment this can be made a lot more efficient if one needs to draw repeatedly from the same data. In that case, one can pre-compute the cumulative distribution function and then just do a binary search for the index such that cumulative prob. <= rand ~ U(0, 1). In Python, this could for example look somehow like the following

from random import random
from bisect import bisect_right


def cdf(probs):
    cdf = []
    total = 0.0
    for p in probs:
        total += p
        cdf.append(total)
    return cdf


def roulette_wheel_bisect(cdf):
    return bisect_right(cdf, random())

# compute cdf
cumsum = cdf(probs)

# randomly draw 10 indexes 
for i in range(0, 10):
    print(roulette_wheel_bisect(cumsum))

Disclaimer: I'm not a Python programmer by trade, so the code above should only illustrate the general idea. It may not be very robust for practical uses. You should always use a well-tested standard library, numpy for example, if you can.

Edit2:

I've just learned that numpy has numpy.random.choice which does exactly what you need. Example:

from numpy import random

data = ['A', 'B', 'C', 'D']
probs = [0.2, 0.4, 0.3, 0.1]

# randomly draw 10 list elements with replacement
for i in range(0, 10):
    print(random.choice(data, p=probs))

Using numpy, you can pre-process `probs` with `np.cumsum` and then just do a binary search on the result. — Mad Physicist, Apr 21 '17 at 21:53
In fact, if you don't mind, I'd like to post an answer about that. Or you can include it in your own. — Mad Physicist, Apr 21 '17 at 21:53
@Mad Physicist Please post your answer, I don't mind. I'm not a Python programmer. Just dabbling here :) — Stefan Zobel, Apr 21 '17 at 21:54
Honestly I didn't expect someone probably answer my question in such details. Thank you so much. — Masoud Masoumi Moghadam, Apr 26 '17 at 10:13
@Masoud Masoumi Moghadam Always glad to help. Well, if it solves your problem, then please don't forget to accept and upvote :) — Stefan Zobel, Apr 26 '17 at 10:20

score 0 · Answer 2 · answered Aug 01 '23 at 19:46

I know this is old but I wanted to do something similar recently and found this question. I concur with Stefan's last answer under the Edit2 using np.random.choice. In my testing np.random.choice became the most efficient way to do this as the sampling got larger (over 1000 samples). It also becomes more efficient as the samples increase to use a numpy array for the probability than using an auxiliary list containing 1000s of observations to select from (I assume because of the smaller storage). I would take advantage of the size parameter of np.random.choice then assign the sampling to a new array.

from numpy import random

data = ['A', 'B', 'C', 'D']
probs = [0.2, 0.4, 0.3, 0.1]
number_of_selections = 100000

# randomly draw 100000 list elements with replacement
indices = np.random.choice(len(data), size=number_of_selections, p=probs)

# assign to an array
sampling = data[indices]
print(sampling)

How to choose an item in a list according to a specific probability?

2 Answers2