33

I need to return different values based on a weighted round-robin such that 1 in 20 gets A, 1 in 20 gets B, and the rest go to C.

So:

A => 5%
B => 5%
C => 90%

Here's a basic version that appears to work:

import random

x = random.randint(1, 100)

if x <= 5:
    return 'A'
elif x > 5 and x <= 10:
    return 'B'
else:
    return 'C'

Is this algorithm correct? If so, can it be improved?

doremi
  • 14,921
  • 30
  • 93
  • 148
  • 1
    You could use `random.randint(1,20)` for your case. – Akavall Feb 21 '13 at 00:31
  • @Akavall - How so? (1,20) would only return only let me eval if only A or B fell into a 5% range, but not both - right? – doremi Feb 21 '13 at 00:39
  • 1
    Your random integer can take value 1 to 20, if random integer is 1, you return A (5% chance), if random integer is 2, you return B (5% chance), if random integer is anything else you return C (90% likelihood). Am I missing something? – Akavall Feb 21 '13 at 00:42
  • 2
    6 in 1, half a dozen in the other. Your logic works now that I think about it. – doremi Feb 21 '13 at 00:44
  • 4
    In case you wanted to research this in the more general case, the concept you are referring to is "generating random variates using the inverse cdf method." – Joel Cornett Feb 21 '13 at 00:50
  • I'd like to point out an elaborate article that covers this very well: http://eli.thegreenplace.net/2010/01/22/weighted-random-generation-in-python – ToonAlfrink Feb 18 '15 at 09:21
  • I actually prefer the solution you provided in your question, which saves memory space comparing with the below answers that leverage an extra list of characters/strings. Thanks! – Leo Sep 17 '20 at 19:26

4 Answers4

62

Your algorithm is correct, how about something more elegant:

import random
my_list = ['A'] * 5 + ['B'] * 5 + ['C'] * 90
random.choice(my_list)
jurgenreza
  • 5,856
  • 2
  • 25
  • 37
  • 3
    Why not just `my_list = ['A'] * 5 + ['B'] * 5 + ['C'] * 90`? – Joel Cornett Feb 21 '13 at 00:47
  • you are right, updated. – jurgenreza Feb 21 '13 at 00:48
  • As an aside, I'm going to be implementing this in Lua, not Python - but still - very cool! – doremi Feb 21 '13 at 00:49
  • 9
    +1, but as an aside, it's not necessary to generate 100 list items, as long as the number of items remains proportional. In this instance, you can do `['A', 'B'] + ['C'] * 18`. – Joel Cornett Feb 21 '13 at 00:53
  • 3
    @JoelCornett Thought about that, but I think it is more readable this way. Thanks for the correction. – jurgenreza Feb 21 '13 at 00:56
  • Doesn't need to be a list: eg. `random.choice('A' * 5 + 'B' * 5 + 'C' * 90)` – John La Rooy Feb 12 '14 at 00:17
  • 3
    In this limited case, it may not matter, but this approach is inefficient in general, both in terms of time and space. A better solution is to assign ranges in [0,1) corresponding to the proportion of the weight, e.g., [0, 5/100) for A, [5/100, 10/100) for B and [10/100, 1) for C, with the appropriate approximations/rounding when dealing with such things like repeating decimals, and then generating a random number with random.random or random.uniform. – danielm Jan 30 '17 at 16:56
  • 2
    As of Python 3.6, you can use [random.choices](https://docs.python.org/3/library/random.html#random.choices). And if you care about speed, the next two answers are faster. – hostingutilities.com Feb 23 '18 at 06:04
  • 1
    One of the most elegant answers I have seen on Python stack overflow. This made me happy. – Micheal J. Roberts Aug 01 '19 at 13:41
38

that's fine. more generally, you can define something like:

from collections import Counter
from random import randint

def weighted_random(pairs):
    total = sum(pair[0] for pair in pairs)
    r = randint(1, total)
    for (weight, value) in pairs:
        r -= weight
        if r <= 0: return value

results = Counter(weighted_random([(1,'a'),(1,'b'),(18,'c')])
                  for _ in range(20000))
print(results)

which gives

Counter({'c': 17954, 'b': 1039, 'a': 1007})

which is as close to 18:1:1 as you can expect.

andrew cooke
  • 45,717
  • 10
  • 93
  • 143
  • You are assuming the input weights are in ascending order. Sort the input weights first if you want to be safe. – Ben P Sep 12 '14 at 18:11
  • 3
    It is not neccesary - such algorithm decrements the counter for each entry, so the result is the same (statistically) – Luis Masuelli Sep 19 '14 at 18:45
  • Adding to what Luis Masuelli said, for any given probability the range of values which correspond is: the sum of all previous elements in the array to that number plus the probability value. e.g. [.05, .05, .5, .4] correspond approx to .00-.05 | .051 - .10 | .101 - .6 | .601 - 1 ranges – Erich Jun 04 '17 at 11:26
  • 4
    This is better than the accepted answer in my opinion. I get that Python code should strive for readability, but just throwing away memory like it is is crazy. Especially if you're using this in simulations where the weights are constantly changing. Interpreter will be building and cleaning up sooo many lists – Cruncher Apr 29 '18 at 12:47
9

If you want to use weighted random and not percentile random, you can make your own Randomizer class:

import random

class WeightedRandomizer:
    def __init__ (self, weights):
        self.__max = .0
        self.__weights = []
        for value, weight in weights.items ():
            self.__max += weight
            self.__weights.append ( (self.__max, value) )

    def random (self):
        r = random.random () * self.__max
        for ceil, value in self.__weights:
            if ceil > r: return value

w = {'A': 1.0, 'B': 1.0, 'C': 18.0}
#or w = {'A': 5, 'B': 5, 'C': 90}
#or w = {'A': 1.0/18, 'B': 1.0/18, 'C': 1.0}
#or or or

wr = WeightedRandomizer (w)

results = {'A': 0, 'B': 0, 'C': 0}
for i in range (10000):
    results [wr.random () ] += 1

print ('After 10000 rounds the distribution is:')
print (results)
Hyperboreus
  • 31,997
  • 9
  • 47
  • 87
0

It seems correct since you are using a uniform random variable with independent draws the probability for each number will be 1/n (n=100).

You can easily verify your algorithm by running it say 1000 time and see the frequency for each letter.

Another algorithm you might consider is to generate an array with your letters given the frequency you want for each letter and only generate a single random number which is the index in the array

It will be less efficient in memory but should perform better

Edit:

To respond to @Joel Cornett comment, an example will be very similar to @jurgenreza but more memory efficient

import random
data_list = ['A'] + ['B'] + ['C'] * 18
random.choice(data_list )
iTech
  • 18,192
  • 4
  • 57
  • 80