9

I'm having trouble generating random numbers that do not follow a discrete uniform distribution.

So for example, say I have 5 numbers (to keep it simple), a probability of number k being generated would be k/15. (k = 1 to 5)

My idea is to generate a random number j using rand(), and if this number j is:

1 => then number 1 is generated

2 or 3 => num 2

4 or 5 or 6 => num 3

7 or 8 or 9 or 10 => num 4

11 or 12 or 13 or 14 or 15 => num 5

Now scale this to generating 1-10, 1-100, 1-1000. Does this work the way I intend it to? I've constructed a loop that pretty much does this every time a number needs to be generated, I think it's probably inefficient since it goes up until it finds the j number generated in one of the ranges... What could be a better way to do this?

EDIT: or maybe create an array once with the proper numbers and then pull out with rand() better solution?

user1071840
  • 3,522
  • 9
  • 48
  • 74
DillPixel
  • 955
  • 1
  • 14
  • 20

3 Answers3

14

You seem to be on the right track, but C++ already has a specialized random number distribution for that, std::discrete_distribution

#include <iostream>
#include <vector>
#include <map>
#include <random>

int main()
{
    std::random_device rd;
    std::mt19937 gen(rd());

    // list of probabilities    
    std::vector<double> p = {0, 1.0/15, 2.0/15, 3.0/15, 4.0/15, 5.0/15};
    // could also be min, max, and a generating function (n/15 in your case?)
    std::discrete_distribution<> d(p.begin(), p.end());

    // some statistics
    std::map<int, int> m;
    for(int n=0; n<10000; ++n) {
        ++m[d(gen)];
    }
    for(auto p : m) {
        std::cout << p.first << " generated " << p.second << " times\n";
    }
}

online demo: http://ideone.com/jU1ll

Cubbi
  • 46,567
  • 13
  • 103
  • 169
  • The other answers assume the intended distribution follows along with the sequence of triangular numbers but the question also refers to ranges from 1-100 and 1-1000, neither of which are triangular. So the general answer does appear to be more appropriate. – shawnt00 Jul 22 '12 at 22:24
  • @shawnt00 The other answers make no such assumption about the range of outputs; to output a number in the range $1$ to $100,$ they would use the $100$th triangular number. However, for a fixed distribution that would be used many times, a method like this answer might still be a good choice. – David K Nov 01 '18 at 13:40
10

Consider that the sum s of integers from 1 to n is s = n * (n + 1) / 2. Solve for n to get n = (± sqrt(8*s + 1) - 1) / 2. We can ignore the negative square root, as we know n is positive. Thus n = (sqrt(8*s + 1) - 1) / 2.

So, plugging in integers for s between 1 and 15:

s  n
01 1.000000
02 1.561553
03 2.000000
04 2.372281
05 2.701562
06 3.000000
07 3.274917
08 3.531129
09 3.772002
10 4.000000
11 4.216991
12 4.424429
13 4.623475
14 4.815073
15 5.000000

If we take the ceiling of each computed n (the smallest integer not less than n), we get this:

s  n
01 1
02 2
03 2
04 3
05 3
06 3
07 4
08 4
09 4
10 4
11 5
12 5
13 5
14 5
15 5

Thus you can go from the uniform distribution to your distribution in constant space and constant time (no iteration and no precomputed tables):

double my_distribution_from_uniform_distribution(double s) {
    return ceil((sqrt(8*s + 1) - 1) / 2);
}

N.B. This relies on sqrt giving an exact result for a perfect square (e.g. returning exactly 7 given exactly 49). This is normally a safe assumption, because IEEE 754 requires exact rounding of square roots.

IEEE 754 doubles can represent all integers from 1 through 2^53 (and many larger integers, but not contiguously after 2^53). So this function should work correctly for all s from 1 to floor((2^53 - 1) / 8) = 1125899906842623.

rob mayoff
  • 375,296
  • 67
  • 796
  • 848
0

You can take advantage of the curious arithmetical fact that:

S(n) = 1 + 2 + 3 + ... + (n - 1) + n

or simplified:

S(n) = n * (n + 1) / 2

This lets you avoid storing the array.

Regexident
  • 29,441
  • 10
  • 93
  • 100
thb
  • 13,796
  • 3
  • 40
  • 68