(Python) algorithm to randomly select a key based on proportionality/weight

Question

I'm a bit at a loss as to how to find a clean algorithm for doing the following:

Suppose I have a dict k:

>>> k = {'A': 68, 'B': 62, 'C': 47, 'D': 16, 'E': 81}

I now want to randomly select one of these keys, based on the 'weight' they have in the total (i.e. sum) amount of keys.

>>> sum(k.values()) 
>>> 274

So that there's a

>>> 68.0/274.0
>>> 0.24817518248175183

24.81% percent change that A is selected.

How would you write an algorithm that takes care of this? In other words, that makes sure that on 10.000 random picks, A will be selected 2.481 times?

Note that in your last sentence, you cannot guarantee that A will be picked exactly 2481 times if you want to choose letters independently at random with a constant probability. If you want a guarantee like that you should create a fixed pool of 10000 letters in the distribution you require and draw letters without replacement. — Mark Byers, Apr 03 '10 at 08:58
I see. This seems to be related to my remark to Paul Hankin's code. Thanks for clarifying (and setting my mind at ease). — Mathieu Dhondt, Apr 03 '10 at 10:12

score 12 · Accepted Answer · answered Apr 03 '10 at 09:40

12

Here's a weighted choice function, with some code that exercises it.

import random

def WeightedPick(d):
    r = random.uniform(0, sum(d.itervalues()))
    s = 0.0
    for k, w in d.iteritems():
        s += w
        if r < s: return k
    return k

def Test():
    k = {'A': 68, 'B': 62, 'C': 47, 'D': 16, 'E': 81}
    results = {}
    for x in xrange(10000):
        p = WeightedPick(k)
        results[p] = results.get(p, 0) + 1
    print results

Test()

answered Apr 03 '10 at 09:40

Great, thanks! This code returns exactly the right results. Thanks for your quick replies, everyone. – Mathieu Dhondt Apr 03 '10 at 09:56
But still: the exact weight is not attained (a test with 100 million rnd's gave me 24.8203% for "A"). Should I worry about this (i.e. is the code to blame?) or is this just the nature of randomness? – Mathieu Dhondt Apr 03 '10 at 10:08
2

@LaundroMat Yes, the probability converges as the number of trials increases but will never be exact. You can read about variance or standard deviation to learn what you can expect. – Apr 03 '10 at 10:33

score 10 · Answer 2 · answered Apr 03 '10 at 09:39

10

This should do the trick:

>>> k = {'A': 68, 'B': 62, 'C': 47, 'D': 16, 'E': 81}
>>> import random
>>> def weighted_pick(dic):
...     total = sum(dic.itervalues())
...     pick = random.randint(0, total-1)
...     tmp = 0
...     for key, weight in dic.iteritems():
...         tmp += weight
...         if pick < tmp:
...             return key

answered Apr 03 '10 at 09:39

Dave Kirby

25,806
5
67
84

Why do you do total-1? – user2767074 Nov 14 '13 at 16:40
1

@user2767074 because `randint` is inclusive. – wchargin Feb 24 '14 at 23:38

score 5 · Answer 3 · answered Apr 03 '10 at 08:52

5

The algorithm would be this..

Select a number randomly between 1 and 274. To do that, call a rand() funciton (assume it returns a value between 0 and 1), multiply rand() by 274. The resulting value should now be in a range. If its between 1 and 68, select A, if its between 69 and 130 select B and so on. This way, your probability stays alive and your operation succeeds.

PS: I am a Java guy, dont know the syntax of Python.

answered Apr 03 '10 at 08:52

bragboy

34,892
30
114
171

If it exists you should use a function/method that returns an integer in a certain range instead of scaling a floating-point value to stay clear of aliasing problems. – Joey Apr 03 '10 at 08:53
Exactly, the scaling method would be a cast. I mean, i would typically cast the resulting value from the rand method by casting it to an integer. – bragboy Apr 03 '10 at 08:56

score 2 · Answer 4 · answered Apr 03 '10 at 08:52

The simplest way of doing this when your weights are relatively small integers (such as in your example) is to build a long string containing all the characters in the appropriate weights and choose a character at random from it:

import random
d = {'A': 68, 'B': 62, 'C': 47, 'D': 16, 'E': 81}
s = ''.join(k*v for k,v in d.items())
random.choice(s)

Note though that this method will use quite a lot of memory if your weights are large, in which case you might prefer a different solution.

score 2 · Answer 5 · edited Jan 22 '15 at 07:26

2

one should also look at this link

make two list for k, say xk and yk

from scipy import stats
custm = stats.rv_discrete(name='test', values=(xk, yk))
custm.rvs(size=1)

edited Jan 22 '15 at 07:26

rtruszk

3,902
13
36
53

answered Jan 22 '15 at 06:58

Vedsar Kushwaha

313
2
11

bart · Answer 6 · 2010-04-03T11:34:00.560

-1

I've developed an algorithm a few years ago, with application in Perl and SQL, you can read about it here, complete with analysis and tests why it (most likely) is correct.

The concept is simple: for each item, pick a random number, pull it through some function that depends on the items' weight, and pick the item with the lowest value.

That function is:

x[i] = -log(1 - rand())/weight[i]

edited Apr 03 '10 at 11:34

answered Apr 03 '10 at 11:28

bart

7,640
3
33
40

Clever trick, but it's a horrible idea to use this in practice. It's not obviously correct (despite the pages of maths proof), and it's expensive to compute (it requires N rand() calls, and N log() calls). Also, it's likely to have numerical problems. When 1 - rand() is close to 0, the log can be arbitrarily large; perhaps overflowing. – Apr 05 '10 at 08:54

(Python) algorithm to randomly select a key based on proportionality/weight

6 Answers6

Linked

Related