How can I get a weighted random pick from Python's Counter class?

Question

I have a program where I'm keeping track of the success of various things using collections.Counter — each success of a thing increments the corresponding counter:

import collections
scoreboard = collections.Counter()

if test(thing):
    scoreboard[thing]+ = 1

Then, for future tests, I want to skew towards things which have generated the most success. Counter.elements() seemed ideal for this, since it returns the elements (in arbitrary order) repeated a number of times equal to the count. So I figured I could just do:

import random
nextthing=random.choice(scoreboard.elements())

But no, that raises TypeError: object of type 'itertools.chain' has no len(). Okay, so random.choice can't work with iterators. But, in this case, the length is known (or knowable) — it's sum(scoreboard.values()).

I know the basic algorithm for iterating through a list of unknown length and fairly picking an element at random, but I suspect that there's something more elegant. What should I be doing here?

@delnan — see comment on [larsks's answer](http://stackoverflow.com/a/9084700/479426) below. — mattdm, Jan 31 '12 at 18:29

score 7 · Answer 1 · edited Oct 16 '20 at 19:53

Given a dictionary of choices with corresponding relative probabilities (can be the count in your case), you can use the new random.choices added in Python 3.6 like so:

import random

my_dict = {
    "choice a" : 1, # will in this case be chosen 1/3 of the time
    "choice b" : 2, # will in this case be chosen 2/3 of the time
}

choice = random.choices(*zip(*my_dict.items()))[0]

For your code that uses Counter, you can do the same thing, because Counter also has the items() getter.

import collections
import random

my_dict = collections.Counter(a=1, b=2, c=3)
choice = random.choices(*zip(*my_dict.items()))[0]

Explanation: my_dict.items() is [('a', 1), ('b', 2), ('c', 3)].
So zip(*my_dict.items()) is [('a', 'b', 'c'), (1, 2, 3)].
And random.choices(('a', 'b', 'c'), (1, 2, 3)) is exactly what you want.

score 7 · Accepted Answer · answered Jan 31 '12 at 19:06

7

You can do this rather easily by using itertools.islice to get the Nth item of an iterable:

>>> import random
>>> import itertools
>>> import collections
>>> c = collections.Counter({'a': 2, 'b': 1})
>>> i = random.randrange(sum(c.values()))
>>> next(itertools.islice(c.elements(), i, None))
'a'

answered Jan 31 '12 at 19:06

Felix Loether

6,010
2
32
23

1

Is there a way to directly calculate the item rather than iterating through `i-1` elements? If c has small values, that's not a problem, but if one or more of the keys has a very high count, it will take a long time to iterate. – Brian Minton Dec 04 '15 at 15:49
As @BrianMinton hints, this has a worst-case runtime proportional to the sum of the counts in the Counter. If the counts are large, this will be super slow. – Mark Amery Dec 26 '19 at 16:05

score 3 · Answer 3 · answered Jan 31 '12 at 18:12

3

You could wrap the iterator in list() to convert it into a list for random.choice():

nextthing = random.choice(list(scoreboard.elements()))

The downside here is that this expands the list in memory, rather than accessing it item-by-item as would normally get with an iterator.

If you wanted to solve this iteratively, this algorithm is probably a good choice.

answered Jan 31 '12 at 18:12

larsks

277,717
41
399
399

3

Ideally, I'd like to avoid exploding the count into a gigantic list. Doing that negates the advantage of using `Counter` rather than just piling everything into a big container in the first place. – mattdm Jan 31 '12 at 18:29

score 3 · Answer 4 · answered Jan 31 '12 at 18:44

The following will get a random item where the score is the weighting for how often to return that item.

import random

def get_random_item_weighted(scoreboard):    
    total_scoreboard_value = sum(scoreboard.values())

    item_loc = random.random() * total_scoreboard_value
    current_loc = 0
    for item, score in scoreboard.items():
        current_loc += score
        if current_loc > item_loc:
            return item

for instance, if there are 2 items:

item1 has a score 5
item2 has a score 10

item2 will be returned twice as often as item1

score 2 · Answer 5 · answered Jul 11 '17 at 13:20

Another variant, Setup is a bit cumbersome, but lookup is in logarithmic complexity (suitable when several lookups are needed):

import itertools
import random
from collections import Counter
from bisect import bisect

counter = Counter({"a": 5, "b": 1, "c": 1})

#setup
most_common = counter.most_common()
accumulated = list(itertools.accumulate([x[1] for x in most_common])) # i.e. [5, 6, 7]
total_size = accumulated[-1]

# lookup
i = random.randrange(total_size)
print(most_common[bisect(accumulated, i)])

score 0 · Answer 6 · answered Jan 31 '12 at 19:10

Another variant with iteration:

import collections
from collections import Counter
import random


class CounterElementsRandomAccess(collections.Sequence):
    def __init__(self, counter):
        self._counter = counter

    def __len__(self):
        return sum(self._counter.values())

    def __getitem__(self, item):
        for i, el in enumerate(self._counter.elements()):
            if i == item:
                return el

scoreboard = Counter('AAAASDFQWERQWEQWREAAAAABBBBCCDDVBSDF')
score_elements = CounterElementsRandomAccess(scoreboard)
for i in range(10):
    print random.choice(score_elements)

This is broken for non-integer counts and inefficient for very large counts, both of which are legal. — Mark Amery, Dec 26 '19 at 12:26

How can I get a weighted random pick from Python's Counter class?

6 Answers6

Linked