1

My question is an extension of this question: Weighted random numbers

I'm trying to implement a weighted random numbers. I'm currently just banging my head against the wall and cannot figure this out.

In my project (Hold'em hand-ranges, subjective all-in equity analysis), I'm using Boost's random -functions. So, let's say I want to pick a random number between 1 and 3 (so either 1, 2 or 3). Boost's mersenne twister generator works like a charm for this. However, I want the pick to be weighted for example like this:

1 (weight: 90) 2 (weight: 56) 3 (weight:  4)

Does Boost have some sort of functionality for this?

The extension: the user is allowed to dynamically change the weight of a given key.

How does one optimally solve the problem?

The naive solution might be to scan through all elements, adjust the weight of all elements based on the new weight...but that's O(n) for the update. It's very inefficient. How do we do better?

I want update(key, w) and get() to be better than or equal to O(logn)

Community
  • 1
  • 1
nz_21
  • 6,140
  • 7
  • 34
  • 80
  • 2
    Why the python tag? – eike Nov 19 '19 at 10:55
  • 1
    Did you notice the `discrete_distribution` in the Boost.Random documentation? – Shawn Nov 19 '19 at 11:00
  • @Shawn, it is in the `std` now. – Evg Nov 19 '19 at 11:01
  • @GiovanniCerretani Yes, yes it is. However, OP specified that he's using the Boost library, not the standard one. – Shawn Nov 19 '19 at 11:04
  • However, `discrete_distribution` has no interface to change probabilities. The only way is to create a new `discrete_distribution`. – Evg Nov 19 '19 at 11:05
  • @Evg [std::discrete_distribution::param](https://en.cppreference.com/w/cpp/numeric/random/discrete_distribution/param) should do it actually, and is provided also by Boost.Random. – Giovanni Cerretani Nov 19 '19 at 11:08
  • @GiovanniCerretani, can you give an example of how it can be used? I don't see a way to use `param` to change a single probability without constructing either new `discrete_distribution` or `discrete_distribution::param_type`. – Evg Nov 19 '19 at 11:17
  • @Evg you're right, actually the complexity using `param` is O(n). – Giovanni Cerretani Nov 19 '19 at 11:50

6 Answers6

3

One possible solution comes from the arithmetic coding and Fenwick trees.

If you have a list of non-negative numbers, [a_0, ... a_n] of type T, the Fenwick tree data structure allows you to implement the following two functions in O(log n) time:

  1. Index upper_bound(T p): for the given value p, calculate the smallest index i, such that the prefix sum a_0 + ... + a_i > p.
  2. set(Index i, T p): Update a_i <- p.

The algorithm of generating a random i is simple: generate a random number k uniformly distributed in the range [0, sum a_i) and then use i = upper_bound(k) to find i.

Simple example:

i            0 1 2 3 4 5 6 7
a_i          0 1 0 0 3 4 0 2
prefix_sum   0 1 1 1 4 8 8 10

k                   0 1 2 3 4 5 6 7 8 9
i = upper_bound(k)  1 4 4 4 5 5 5 5 7 7

P.Fenwick. A New Data Structure for Cumulative Frequency Tables (PDF, 1994)

My C++ implementation of a Fenwick tree (not thoroughly tested)

Evg
  • 25,259
  • 5
  • 41
  • 83
  • Nice! This is the only solution that satisfies the time complexity criteria. I guess I'll have to start reading papers to pass interviews :( – nz_21 Nov 19 '19 at 12:07
  • @nz_21, this solution is fast, but I'm not sure it is the best. If you find an alternative fast solution, please let me know. – Evg Nov 19 '19 at 12:08
  • Evg fast enough for the interviewer's constraints :) – nz_21 Nov 19 '19 at 12:10
  • Note that `lower_bound` works best if all the weights are greater than 0. – Peter O. Nov 19 '19 at 19:01
  • Notably, `lower_bound` can cause incorrect results if the first item's weight is zero. – Peter O. Nov 19 '19 at 19:44
  • @PeterO., `lower_bound` seems to be the wrong function. It should be `upper_bound`. Then no special treatment of zero frequencies is needed. Updated the answer. Thanks for pointing to this error! – Evg Nov 19 '19 at 21:18
2

You have both Python and C++ tagged, I'm not sure about Python but in C++ this is actually part of the STL. Take a look at piecewise_constant_distribution.

Pickle Rick
  • 808
  • 3
  • 6
  • And how can you "dynamically change the weight of a given key"? – Evg Nov 19 '19 at 11:18
  • The RNG takes the weight distribution as an argument, so just change it on the next call? How it's changed is up to OP. – Pickle Rick Nov 19 '19 at 11:20
  • That is, recreate the distribution object. But I guess that's not what OP wants. – Evg Nov 19 '19 at 11:27
  • 1
    Well you don't need to actually recreate it, you can store it as a field somewhere and change it when needed. The RNG engine is still the same either way so not sure why this wouldn't work for OP. – Pickle Rick Nov 19 '19 at 11:30
  • 2
    Each time you want to change `i`-th probability, you have to create a new distribution. This is an `O(n)` operation, but OP want something faster than that. – Evg Nov 19 '19 at 11:46
  • Ah, I didn't realize you were referring to speed with that comment. – Pickle Rick Nov 19 '19 at 19:22
0

With python's numpy there is a function numpy.random.choice, that allows you to set probabilities (that sums to 1). So with your weights you could do:

weights = [90, 56, 4]
np.random.choice([1, 2, 3], p=[w / sum(weights) for w in weights])

I don't know about complexities, but numpy is known to be a very efficient library, so maye you can dig up to find its documents and implementation.

Aryerez
  • 3,417
  • 2
  • 9
  • 17
0

If you are using the algorithm from the accepted answer, all you have to do, when you change a single weight is, update the sum_of_weight:

sum_of_weight -= choice_weight[index];
sum_of_weight += new_weight;
choice_weight[index] = new_weight;
eike
  • 1,314
  • 7
  • 18
  • you need to rebuild the whole cumulative distribution to exploit binary search. The rebuilding part is O(n) – nz_21 Nov 19 '19 at 12:04
  • @nz_21 If you expect an answer that refers to the proposed optimization in (https://stackoverflow.com/a/1761646/4253931), you should mention that in your question. – eike Nov 19 '19 at 12:07
0

Why not the plain simple random.choice from a weighted list (generator). Let me know if it works :

import random
generator  = [1] * 90 + [2] * 56 + [3] * 4 #1 (weight: 90) 2 (weight: 56) 3 (weight:  4)
random.choice(generator)
MEdwin
  • 2,940
  • 1
  • 14
  • 27
0
int main()
{
    std::mt19937::result_type seed = std::random_device()();
    auto engine = std::mt19937(seed);

    auto initial_weights = { 90.0, 56.0, 4.0 };
    auto distribution = std::discrete_distribution<>(initial_weights);

    // Use the original distribution
    for (auto i = 0; i != 20; ++i)
    {
        std::cout << distribution(engine) << std::endl;
    }

    std::cout << std::endl;

    // Modify the distribution temporary when generating random numbers
    for (auto i = 0; i != 20; ++i)
    {
        auto param = std::discrete_distribution<>::param_type{ 90.0 - 4.5 * i, 56.0, 4.0 + 5.0 * i };
        std::cout << distribution(engine, param) << std::endl;
    }

    std::cout << std::endl;

    // Make a permanent change to the distribution
    auto param = std::discrete_distribution<>::param_type{ 30.0, 56.0, 40.0 };
    distribution.param(param);

    // Use the modified distribution
    for (auto i = 0; i != 20; ++i)
    {
        std::cout << distribution(engine) << std::endl;
    }

    std::cout << std::endl;

    return 0;
}
user515430
  • 298
  • 1
  • 3
  • 7