12

I am looking for a random number generator that can be biased. For instance, say I want a random number between 1-5, with the probability being:

1: Comes up 20% of the time
2: Comes up 10% of the time
3: Comes up 40% of the time
4: Comes up 25% of the time
5: Comes up 5% of the time

Is there anything in the standard library, or other libraries out there that would do this? Alternatively, is there an efficient way to do this myself?

cmptrer
  • 1,419
  • 4
  • 13
  • 25
  • 18
    I hope you're not writing software for casinos! – Alan May 05 '10 at 18:01
  • 4
    Haha no, I'm sure a casino would hire someone a little smarter. – cmptrer May 05 '10 at 18:05
  • 1
    From yesterday: http://stackoverflow.com/questions/2772882/c-picking-a-random-item-based-on-probabilities and that was a duplicate of scads of earlier versions of the same question (which I'm too lazy to find). The word you may have been missing in searching is "discrete", which is important as a number of the answers below apply better to continuous distributions. – dmckee --- ex-moderator kitten May 05 '10 at 20:06
  • copy and paste tested code for c# ... http://stackoverflow.com/a/33991225/294884 ... works with any array trivially – Fattie Nov 30 '15 at 04:08

11 Answers11

23

For your problem, just pick a random element from this list uniformly:

[1, 1, 1, 1, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5]

In general, check this answer: Weighted random numbers


In TR1 and C++0x, there is <random> header which contains the discrete_distribution class to generate such numbers, among others.

You may also want to check out GSL which contains much more random distributions (and random number generators) than the standard <random> library. (But note that GSL uses GPLv3.)

Community
  • 1
  • 1
kennytm
  • 510,854
  • 105
  • 1,084
  • 1,005
  • 4
    Maybe I should have explained my actual situation a little better. What I really need will be a random number between 1-50,000ish. Creating a list that long seems unnecessary, and unneeded. Sorry for the confusion. – cmptrer May 05 '10 at 18:01
  • 10
    That's why good engineering requires a proper set of requirements. You asked for the simple case and you got the (correct) simple answer. – Barry Wark May 05 '10 at 18:05
15

Best way's probably to just take the normal unbiased random generator then return based on the interval its value falls into.

Just an if statement that gives 1 for 0:0.2, 2 for 0.2:0.3, 3 for 0.3:0.7, 4 for 0.7:0.95 and 5 for 0.95:1. Best to make either the lower or upper limit of the interval inclusive and the other exclusive.

int biasedRandom(){
double i = randomNumber();
if(i<= 0.2){return 1;}
else if(i <= 0.3){return 2;}
else if(i <= 0.7){return 3;}
else if(i <= 0.95){return 4;}
else{return 5;}
}

Something like that.

AaronM
  • 1,577
  • 12
  • 15
  • 3
    If you have a whole lot of intervals to check, what you should do is come up with a cumulative distribution array (wasn't sure what to call it) and do a binary search on it each time to find the the number that was generated. – Justin Peel May 05 '10 at 18:07
  • 2
    Seems like this generator will return 5 always. Double/floats needed and some other math unless randomNumber() returns a value 0-1. – Xorlev May 05 '10 at 18:25
  • Yeah you were right it would have given 5 every time, didn't mean to make i an int, that was a mistake. Thanks for pointing it out. – AaronM May 05 '10 at 18:32
10

The Boost random number library provides the ability to specify different shaped distributions for your generator. It's a great library - see http://www.boost.org/doc/libs/1_42_0/libs/random/index.html.

  • 1
    Yes, don't reinvent the wheel! – Barry Wark May 05 '10 at 18:18
  • 8
    Someday I am going to start using Boost. Someday. – Michael Myers May 05 '10 at 18:20
  • 1
    It just lacks a proper documentation... would be great to know what the required concepts for the engine and the distribution are so as to be able to craft one's own without reverse engineering the library :( – Matthieu M. May 05 '10 at 18:26
  • Libraries are good, but I think it's safe to say he's a pretty new programmer, in which case, they're more of a hindrance as you need the experience at first to get used to the logic. – AaronM May 05 '10 at 18:35
  • @AaronM Writing your own random number generator is fraught with problems - using an existing, tested one is a much better bet. –  May 05 '10 at 18:37
  • @mmeyers, @matthieu: Check out TR1's random library. (Random link from Google: http://www.johndcook.com/cpp_TR1_random.html.) Likely bundled with your compiler and likely a cousin of Boost, with higher level of documentation. – Potatoswatter May 05 '10 at 20:53
4

What you are describing is the implementation of a random number generator that draws from a particular probability distribution. For example, drawing numbers from a Gaussian distribution should draw random numbers such that the probability of a particular draw, x is proportional to alt text
(source: wikimedia.org)
.

In general, the approach is to draw from a uniform random distribution and then pick the value of the desired distribution's cumulative distribution function (CDF) at that drawn location. In the case of a Normal Gaussian, draw a random number, x from a uniform distribution (this is what standard random number generators should give) and then choose alt text as the random, Gaussian distributed value. For your case, the CDF you describe is a piece-wise continuous stair-step function which could be implemented using any of the many (correct) answers you have already received.

Of course, this is all trivia. What you should be doing is using a library that already handles this for you. Statistics and random number generation are not trivial and there's no need to re-invent the wheel. See Neil's answer (and check out the Boost random number library).

Glorfindel
  • 21,988
  • 13
  • 81
  • 109
Barry Wark
  • 107,306
  • 24
  • 181
  • 206
4

Coming late to the party on this one. Here is the C++0x answer:

#include <iostream>
#include <random>
#include <iterator>

int main()
{
    // Set up distribution
    double interval[] = {1,   2,   3,   4,   5,   6};
    double weights[] = {  .2,   .1,  .4,  .25, .05};
    std::piecewise_constant_distribution<> dist(std::begin(interval),
                                                std::end(interval),
                                                std::begin(weights));
    // Choose generator
    std::mt19937 gen;  // seed as wanted
    // Demonstrate by pouring into avg[rand-1]
    const unsigned N = 1000000;
    double avg[sizeof(weights) / sizeof(weights[0])] = {0};
    for (unsigned i = 0; i < N; ++i)
        avg[static_cast<unsigned>(dist(gen)) - 1]++;
    // Comute averages
    for (double* i = std::begin(avg); i < std::end(avg); ++i)
        *i /= N;
    // Display
    for (unsigned i = 1; i <= sizeof(avg)/sizeof(avg[0]); ++i)
        std::cout << "avg[" << i << "] = " << avg[i-1] << '\n';
}

Which for me outputs:

avg[1] = 0.199779
avg[2] = 0.100002
avg[3] = 0.400111
avg[4] = 0.250257
avg[5] = 0.049851
Howard Hinnant
  • 206,506
  • 52
  • 449
  • 577
0

Why don't you just use a regular random number generator that return number between 0.0 and 1.0, and wrap it with another function that returns a number according to your requirements?

like

double biased (double seed) {
if (seed >= 0.0 && seed <0.2) return 1;
else if  ...
}
DaClown
  • 4,171
  • 6
  • 31
  • 31
  • 2
    I would not use `seed` as an identifier for a randomly generated number, it's confusing... – Matthieu M. May 05 '10 at 18:01
  • Why not? The C rand function also takes a seed, in most cases the system time. So biased is a function that generates a random number based on a seed. – DaClown May 06 '10 at 12:38
0

Throw a random real number x in [0,1], if 0< x<0.2 return 1, if 0.2<x <0.3 return 2, etc.

See here for the general problem.

leonbloy
  • 73,180
  • 20
  • 142
  • 190
0

Kenny gave an appropriate answer tailored to your particular frequency distribution.

The more general answer works with a CDF - Cumulative Distribution Function - for the data, and uses a uniform random number to pick a value within the distribution.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
0
#include <boost/random/discrete_distribution.hpp>
#include <boost/random/mersenne_twister.hpp>
#include <boost/random/variate_generator.hpp>

#include <iostream>

int main()
{

  unsigned int seed = 42;
  boost::mt19937 generator(seed);

  // return 0 with probability 10%
  //        1                  40%
  //        2                  50%
  boost::random::discrete_distribution<int> custom_dist{1,4,5};

  boost::variate_generator<boost::mt19937&,
  boost::random::discrete_distribution<int> > rndn(generator, custom_dist);

  for (unsigned int i = 0; i<10000; i++) {
    std::cout << rndn() << std::endl;
  }

  return 0;

}

And here is a plot of the result:

Output

J0e3gan
  • 8,740
  • 10
  • 53
  • 80
sb933
  • 101
  • 7
0

I am doing to do the same thing and I found this: http://eli.thegreenplace.net/2010/01/22/weighted-random-generation-in-python/

Seems good enough for the purpose you stated.

0

I was looking for something like this for TypeScript, but only found this question for C.

So here is a biased random number generator in TypeScript I came up with, in case anybody needs something like this in TypeScript. I am sure you can translate it to C somehow.

export async function weightedRandomItem<T>(list: { weight: number; item: T }[]): Promise<T> {
    const weightSum = sumBy(list, (item) => item.weight)
    const randomIndex = await randomIntegerBetween(1, weightSum)

    let currentIndex = 1
    for (const listItem of list) {
        if (randomIndex >= currentIndex && randomIndex < currentIndex + listItem.weight) {
            return listItem.item
        }
        currentIndex += listItem.weight
    }
    throw new Error("No item selected. Impossible.")
}

where randomIntegerBetween(minInclusive: number, maxInclusive: number) returns a random integer from the specified range (min and max inclusive) from the RNG of your choice.

sumBy() is the lodash function in this case, and it should be self-explanatory.

As input you could pass in something like:

[{
    weight: 10,
    item: 1,
},
{
    weight: 50,
    item: 2,
},
{
    weight: 30,
    item: 3,
},
{
    weight: 10,
    item: 4,
}]

Then, the result would most probably be 2.

Michael Geier
  • 1,653
  • 1
  • 16
  • 18