53

If I have a collection of items in a list. I want to choose from that list according to another list of weights.

For example my collection is ['one', 'two', 'three'] and the weights are [0.2, 0.3, 0.5], the I would expect the method to give me 'three' in about half of all draws.

What is the easiest way to do so ?

Zeugma
  • 31,231
  • 9
  • 69
  • 81
Mischa Obrecht
  • 2,737
  • 6
  • 21
  • 31

7 Answers7

95

Since version 1.7 you can use numpy.random.choice():

elements = ['one', 'two', 'three'] 
weights = [0.2, 0.3, 0.5]

from numpy.random import choice
print(choice(elements, p=weights))
Matt
  • 27,170
  • 6
  • 80
  • 74
Quant Metropolis
  • 2,602
  • 2
  • 17
  • 16
43

Since Python 3.6, you can do weighted random choice (with replacement) using random.choices.

random.choices(population, weights=None, *, cum_weights=None, k=1)

Example usage:

import random
random.choices(['one', 'two', 'three'], [0.2, 0.3, 0.5], k=10)
# ['three', 'two', 'three', 'three', 'three',
#  'three', 'three', 'two', 'two', 'one']
Esteis
  • 4,669
  • 2
  • 29
  • 45
11

This function takes two arguments: A list of weights and a list containing the objects to choose from:

from numpy import cumsum
from numpy.random import rand
def weightedChoice(weights, objects):
    """Return a random item from objects, with the weighting defined by weights 
    (which must sum to 1)."""
    cs = cumsum(weights) #An array of the weights, cumulatively summed.
    idx = sum(cs < rand()) #Find the index of the first weight over a random value.
    return objects[idx]

It does not use any python loops.

Gareth Latty
  • 86,389
  • 17
  • 178
  • 183
Mischa Obrecht
  • 2,737
  • 6
  • 21
  • 31
  • 2
    The comments appear to be misleading. `cumsum()` gives the the cumulative values, not boolean values. To be clear, this does work, but the comments don't match what is actually happening. – Gareth Latty May 29 '12 at 16:34
  • I have edited to fix, and also put the docstring on one line, as recommended in [PEP 257](http://www.python.org/dev/peps/pep-0257/#one-line-docstrings). – Gareth Latty May 29 '12 at 16:42
  • 1
    Assuming the weights are positive, cs is a sorted list. Using numpy.searchsorted will result in a significant speed up for finding the index – Nick R Feb 18 '19 at 23:03
5

You can use the multinomial distribution (from numpy) to do what you want. E.g.

elements = ['one', 'two', 'three'] 
weights = [0.2, 0.3, 0.5]


import numpy as np

indices = np.random.multinomial( 100, weights, 1)
#=> array([[20, 32, 48]]), YMMV

results = [] #A list of the original items, repeated the correct number of times.
for i, count in enumerate(indices[0]):
    results.extend( [elements[i]]*count )

So the element in first position came up 20 times, the element in second position came up 32 times, and the element in third position came up 48 times, roughly what you would expect given the weights.

If you're having a hard time wrapping your head around the multinomial distribution, I found the documentation really helpful.

Maus
  • 1,791
  • 1
  • 17
  • 28
  • 2
    Note you can reduce your building of results to `itertools.chain.from_iterable([elements[i]]*count, for i, count in enumerate(indices[0]))`, which will be faster. – Gareth Latty May 29 '12 at 17:03
  • 1
    In fact, you can improve it even further by replacing the list multiplication with `itertools.repeat(elements[i], count)` too. – Gareth Latty May 29 '12 at 17:09
4

If you did not want to use numpy, you can follow the same method with something like this:

from random import random
from itertools import takewhile

def accumulate(iterator):
    """Returns a cumulative sum of the elements.
    accumulate([1, 2, 3, 4, 5]) --> 1 3 6 10 15"""
    current = 0
    for value in iterator:
        current += value
        yield current

def weightedChoice(weights, objects):
    """Return a random item from objects, with the weighting defined by weights 
    (which must sum to 1)."""
    limit = random()
    return objects[sum(takewhile(bool, (value < limit for value in accumulate(weights))))]

We use itertools.takewhile() to avoid checking values once we reach the point we want to stop, otherwise, this is essentially the same idea as Mischa Obrecht's answer, just without numpy.

Community
  • 1
  • 1
Gareth Latty
  • 86,389
  • 17
  • 178
  • 183
2

How about just initializing your list to match your choices with the expected weights. Here I'm make a list of 100 values representing your desired "pull" percentage.

>>> import random
>>> elements = ['one', 'two', 'three'] 
>>> weights = [0.2, 0.3, 0.5]
>>>
>>> # get "sum" of result list of lists (flattens list)
>>> choices = sum([[element] * int(weight * 100)for element, weight in zip(elements, weights)], [])
>>> random.choice(choices)
three

It's not cumulative, but it looks like it might be what your looking for.

monkut
  • 42,176
  • 24
  • 124
  • 155
  • looks like it has the same effect, but allocating a 3*100 vector just to do a choice seems a bit of an overkill. Especially if I would use it in the context the problem first came up, which is a Monte Carlo simulation, where you want to be as fast as possible... – Mischa Obrecht Jun 12 '12 at 12:15
  • You should add that info to the question. But, your only allocating the list once, calling "random.choice()" will be fast. – monkut Jun 13 '12 at 00:34
  • yes, but I'd say if there is a cheap way and an expensive way to achieve the same result, it goes without saying, that one chooses the cheap one. Judges ruling ? :) – Mischa Obrecht Jun 14 '12 at 07:05
1

To build upon Maus' answer, which is great if you want to repeatedly get weighted random values, if you only wanted a single value, you can do this very simply by combining numpy.random.multinomial() and itertools.compress():

from itertools import compress
from numpy.random import multinomial

def weightedChoice(weights, objects):
    """Return a random item from objects, with the weighting defined by weights 
    (which must sum to 1)."""
    return next(compress(objects, multinomial(1, weights, 1)[0]))
Community
  • 1
  • 1
Gareth Latty
  • 86,389
  • 17
  • 178
  • 183