Pythonic way to select list elements with different probability

Question

import random
pos = ["A", "B", "C"]
x = random.choice["A", "B", "C"]

This code gives me either "A", "B" or "C" with equal probability. Is there a nice way to express it when you want "A" with 30%, "B" with 40% and "C" with 30% probability?

Searching found several similar/identical questions [here](http://stackoverflow.com/questions/526255/probability-distribution-in-python) and [here](http://stackoverflow.com/questions/1056151/random-python-dictionary-key-weighted-by-values) — snapshoe, Nov 06 '10 at 14:26
@ma3: One of the weak points of this site is that as old questions tend to be ignored, there's no mechanism or motivation to improve on them. I think my answer is significantly better than at least the higher answers on those questions--havn't read the lower ones--but nobody would ever see it if I posted it on those. — Glenn Maynard, Nov 06 '10 at 14:35
Starting with Python 3.6 you can use `random.choices` to submit a list of weights/probabilities. — Nick, Aug 03 '20 at 03:51

score 53 · Accepted Answer · edited Sep 07 '18 at 03:09

Weights define a probability distribution function (pdf). Random numbers from any such pdf can be generated by applying its associated inverse cumulative distribution function to uniform random numbers between 0 and 1.

See also this SO explanation, or, as explained by Wikipedia:

If Y has a U[0,1] distribution then F⁻¹(Y) is distributed as F. This is used in random number generation using the inverse transform sampling-method.

import random
import bisect
import collections

def cdf(weights):
    total = sum(weights)
    result = []
    cumsum = 0
    for w in weights:
        cumsum += w
        result.append(cumsum / total)
    return result

def choice(population, weights):
    assert len(population) == len(weights)
    cdf_vals = cdf(weights)
    x = random.random()
    idx = bisect.bisect(cdf_vals, x)
    return population[idx]

weights=[0.3, 0.4, 0.3]
population = 'ABC'
counts = collections.defaultdict(int)
for i in range(10000):
    counts[choice(population, weights)] += 1
print(counts)

# % test.py
# defaultdict(<type 'int'>, {'A': 3066, 'C': 2964, 'B': 3970})

The choice function above uses bisect.bisect, so selection of a weighted random variable is done in O(log n) where n is the length of weights.

Note that as of version 1.7.0, NumPy has a Cythonized np.random.choice function. For example, this generates 1000 samples from the population [0,1,2,3] with weights [0.1, 0.2, 0.3, 0.4]:

import numpy as np
np.random.choice(4, 1000, p=[0.1, 0.2, 0.3, 0.4])

np.random.choice also has a replace parameter for sampling with or without replacement.

A theoretically better algorithm is the Alias Method. It builds a table which requires O(n) time, but after that, samples can be drawn in O(1) time. So, if you need to draw many samples, in theory the Alias Method may be faster. There is a Python implementation of the Walker Alias Method here, and a numpy version here.

score 30 · Answer 2 · answered Nov 06 '10 at 13:44

30

Not... so much...

pos = ['A'] * 3 + ['B'] * 4 + ['C'] * 3
print random.choice(pos)

or

pos = {'A': 3, 'B': 4, 'C': 3}
print random.choice([x for x in pos for y in range(pos[x])])

answered Nov 06 '10 at 13:44

Ignacio Vazquez-Abrams

776,304
153
1,341
1,358

5

If the ratios are from user input this is potentially dangerous, eg. 99.999999%/0.000001%. – Glenn Maynard Nov 06 '10 at 13:46
Is it possible to somehow incorporate the dictionary into `random.choice()`? – Vladimir Putin Jun 21 '14 at 23:28
@SomeGuy: `random.choice()` takes a sequence. – Ignacio Vazquez-Abrams Jun 21 '14 at 23:39
So, in other words, I could? – Vladimir Putin Jun 21 '14 at 23:53
1

@SomeGuy: You would need to convert the dictionary into a sequence. Like how my second bit of code does. – Ignacio Vazquez-Abrams Jun 21 '14 at 23:54
1

This is one of the most elegant answers I've seen, but unfortunately it does not work if the weights are not integers :( – Akshay Sep 13 '14 at 21:49
1

Good approach if the weights are given as small integers. If the weights are floats with 3 digit precision, they have to be multiplied by 10^3, creating a big array and inefficient sampling – Protostome Feb 24 '15 at 10:03

score 9 · Answer 3 · answered Nov 06 '10 at 14:04

Here's a class to expose a bunch of items with relative probabilities, without actually expanding the list:

import bisect
class WeightedTuple(object):
    """
    >>> p = WeightedTuple({'A': 2, 'B': 1, 'C': 3})
    >>> len(p)
    6
    >>> p[0], p[1], p[2], p[3], p[4], p[5]
    ('A', 'A', 'B', 'C', 'C', 'C')
    >>> p[-1], p[-2], p[-3], p[-4], p[-5], p[-6]
    ('C', 'C', 'C', 'B', 'A', 'A')
    >>> p[6]
    Traceback (most recent call last):
    ...
    IndexError
    >>> p[-7]
    Traceback (most recent call last):
    ...
    IndexError
    """
    def __init__(self, items):
        self.indexes = []
        self.items = []
        next_index = 0
        for key in sorted(items.keys()):
            val = items[key]
            self.indexes.append(next_index)
            self.items.append(key)
            next_index += val

        self.len = next_index

    def __getitem__(self, n):
        if n < 0:
            n = self.len + n
        if n < 0 or n >= self.len:
            raise IndexError

        idx = bisect.bisect_right(self.indexes, n)
        return self.items[idx-1]

    def __len__(self):
        return self.len

Now, just say:

data = WeightedTuple({'A': 30, 'B': 40, 'C': 30})
random.choice(data)

Note that I only sorted the keys before inserting (rather than just using `items.iteritems`) so it would have deterministic output, which makes the doctests simpler. — Glenn Maynard, Nov 06 '10 at 14:05
Note that I avoided floating-point: if something can be done clearly with integers, it avoids any question of rounding error affecting the output, and it's much easier for the doctest to be comprehensive. Also, this doesn't care what the sum of the values is, eg. it's implicitly normalizing. For example, if you're picking from a list of weighted mirrors and you disable one of them, you don't have to adjust all of the rest so the weights add up to 1. — Glenn Maynard, Nov 06 '10 at 14:51
Note that `sorted(dict.keys())` is a little redundant. `sorted(d)` does the same thing with one list less. — Thomas Wouters, Nov 06 '10 at 14:59
I wrote a similar class based on this one, in Objective-C http://stackoverflow.com/a/22243637/129202 — Jonny, Mar 07 '14 at 07:10

jsbueno · Answer 4 · 2022-04-07T13:58:16.373

8

As of Python 3.6, there is random.choices for that.

original answer from 2010:

You can also make use this form, which does not create a list arbitrarily big (and can work with either integral or decimal probabilities):

pos = [("A", 30), ("B", 40), ("C", 30)]


from random import uniform
def w_choice(seq):
    total_prob = sum(item[1] for item in seq)
    chosen = random.uniform(0, total_prob)
    cumulative = 0
    for item, probality in seq:
        cumulative += probality
        if cumulative > chosen:
            return item

edited Apr 07 '22 at 13:58

answered Nov 06 '10 at 14:45

jsbueno

99,910
10
151
209

1

This can be further optimised by using bisect.bisect to identify the option - instead of iterating through all options. This reduces the iteration from O(N) to O(logN) where N is the number of the options. – Nick Aug 03 '20 at 14:01

Jeet · Answer 5 · 2010-11-06T23:54:04.213

5

There are some good solutions offered here, but I would suggest that you look at Eli Bendersky's thorough discussion of this issue, which compares various algorithms to achieve this (with implementations in Python) before choosing one.

edited Nov 06 '10 at 23:54

answered Nov 06 '10 at 18:28

Jeet

38,594
7
49
56

Jeff Bradberry · Answer 6 · 2010-11-06T16:58:19.230

1

Try this:

import random
from decimal import Decimal

pos = {'A': Decimal("0.3"), 'B': Decimal("0.4"), 'C': Decimal("0.3")}
choice = random.random()
F_x = 0
for k, p in pos.iteritems():
    F_x += p
    if choice <= F_x:
        x = k
        break

edited Nov 06 '10 at 16:58

answered Nov 06 '10 at 14:36

Jeff Bradberry

1,597
1
12
11

`TypeError: Cannot convert float to Decimal. First convert the float to a string` – Gabi Purcaru Nov 06 '10 at 15:18

Pythonic way to select list elements with different probability

6 Answers6

Linked