3

I am trying to figure out how an algorithm that generates pseudorandom numbers that are not uniformly distributed works. The book I am reading gives a pseudocode implementation as follows

// Pick an item from a array with given probabilities.

Item: PickItemWithProbabilities(Item: items[],
Float: probabilities[])
Float: value = <PRNG value where 0 <= value < 1>
For i = 0 To items.Length - 1
value = value – probabilities[i]
If (value <= 0) Then Return items[i]
Next i
End PickItemWithProbabilities

I have implemented the pseudocode in python and it works as expected, if I run the function a sufficient amount of rounds, the distribution is as per the fed probability values. The thing is I don't exactly understand why and how it works.

Here is the code I implemented from the pseudocode.

from random import random

colors = ["pink", "green", "yellow"]
probabilities = [0.7, 0.2, 0.1]
trials = 1000000

def non_uniform_random_picker(array, probabilities):
    rand_value = random()
    for i in range(len(array)):
        rand_value -= probabilities[i]
        if rand_value <= 0:
            return array[i]

results = dict()

for i in range(trials):
    value = non_uniform_random_picker(colors, probabilities)

    if results.get(value) == None:
        results[value] = 1
    else:
        results[value] += 1


for key in results:    
    percentage = round(((results[key]/trials) * 100), 2)
    print(f"{key}:{percentage}% ")




I need help figuring out how the algorithm is working.

Wechuli
  • 45
  • 6
  • see the legendary [Understanding “randomness”](https://stackoverflow.com/a/3956538/2521214) that should cover what happens when you combine more uniform pseudo random values – Spektre Sep 11 '19 at 11:23

1 Answers1

7

You can create a non-uniform distribution from a uniform distribution using the cumulative function (see https://en.wikipedia.org/wiki/Inverse_transform_sampling#The_method)

The cumulative function for distribution P(x) is the sum of probabilities <= x.

In this case, we have distribution [0.7, 0.2, 0.1]
The cumulative is: [.7, .9, 1.0]

If we select a uniformly random variable between 0 and 1 (i.e. random.uniform(0, 1)), and compare and output (rand_value) to the cumulative distribution we would expect:

  • 70% of the time, rand_value would be < .7
  • 20% of the time, rand_value would be between .7 and .9
  • 10% of the time, rand_value to be between .9 and 1

The code:

rand_value = random()
for i in range(len(array)):
    rand_value -= probabilities[i]
    if rand_value <= 0:

Stops when: probabilities[0] + probabilities[1] + ...probabilities[i] >= rand_value

This is the same as the cumulative distribution up to index i.

By choosing index i according to this stop condition we have created the desired non-uniform sampling of our list.

DarrylG
  • 16,732
  • 2
  • 17
  • 23