3

What is the distribution (uniform, poisson, normal, etc.) that is generated if I did the below? The output appears to indicate a uniform distribution. But then, why do we need std::uniform_int_distribution?

int main()
{
  std::mt19937_64 generator(134);
  std::map<int, int> freq;
  const int size = 100000;
  for (int i = 0; i < size; ++i) {
    int r = generator() % size;
    freq[r]++;
  }
  for (auto f : freq) {
    std::cout << std::string(f.second, '*') << std::endl;
  }
  return 0;
}

Thanks!

cmutex
  • 1,478
  • 11
  • 24

4 Answers4

3

Because while generator() is an uniform distribution over [generator.min(), generator.max()], generator() % n is not a uniform distribution over [0, n) (unless generator.max() is an exact multiple of n, assuming generator.min() == 0).

Let's take an example: min() == 0, max() == 65'535 and n == 7.

gen() will give numbers in the range [0, 65'535] and in this range there are:

  • 9'363 numbers such that gen() % 7 == 0
  • 9'363 numbers such that gen() % 7 == 1
  • 9'362 numbers such that gen() % 7 == 2
  • 9'362 numbers such that gen() % 7 == 3
  • 9'362 numbers such that gen() % 7 == 4
  • 9'362 numbers such that gen() % 7 == 5
  • 9'362 numbers such that gen() % 7 == 6

If you are wondering where did I get these numbers think of it like this: 65'534 is an exact multiple of 7 (65'534 = 7 * 9'362). This means that in [0, 65'533] there are exactly 9'362 numbers who map to each of the {0, 1, 2, 3, 4, 5, 6} by doing gen() % 7. This leaves 65'534 who maps to 0 and 65'535 who maps to 1

So you see there is a bias towards [0, 1] than to [2, 6], i.e.

  • 0 and 1 have a slightly higher chance (9'363 / 65'536 ≈ 14.28680419921875 %)‬ of appearing than
  • 2, 3, 4, 5 and 6 (9'362 / 65'536 ≈ 14.2852783203125‬ %).

std::uniformn_distribution doesn't have this problem and uses some mathematical woodo with possibly getting more random numbers from the generator to achieve a truly uniform distribution.

bolov
  • 72,283
  • 15
  • 145
  • 224
  • Your answer is correct but `generator::max()` is typically one less than a power of 2, not an exact power of 2. Might be a bit confusing to someone trying to understand the workings of the internal source of `std::random_distribution` . – doug Nov 09 '19 at 02:05
  • @doug good point. I will update the answer when I get the time – bolov Nov 09 '19 at 11:16
2

The random engine std::mt19937_64 outputs a 64-bit number that behaves like a uniformly distributed random number. Each of the C++ random engines (including those of the std::mersenne_twister_engine family) outputs a uniformly-distributed pseudorandom number of a specific size using a specific algorithm.

Specifically, std::mersenne_twister_engine meets the RandomNumberEngine requirement, which in turn meets the UniformRandomBitGenerator requirement; therefore, std::mersenne_twister_engine outputs bits that behave like uniformly-distributed random bits.

On the other hand, std::uniform_int_distribution is useful for transforming numbers from random engines into random integers of a user-defined range (say, from 0 through 10). But note that uniform_int_distribution and other distributions (unlike random number engines) can be implemented differently from one C++ standard library implementation to another.

Peter O.
  • 32,158
  • 14
  • 82
  • 96
1

std::mt19937_64 generates a pseudo-random mutually independent sequence of long long / unsigned long long numbers. It is supposed to be uniform but I don't know the exact details of the engine, though, it is one of the best discovered engines thus far.

By taking % n you get an approximation to pseudo-random uniform distribution over integers [0, ... ,n] - but it is inherently inaccurate. Certain numbers have slightly higher chance to occur while others have slightly lower chance depending on n. E.g., since 2^64 = 18446744073709551616 so with n=10000 first 1616 values have a slightly higher chance to occur than the last 10000-1616 values. std::uniform_distribution takes care of the inaccuracy by taking a new random number in very rare cases: say, if the number is above 18446744073709550000 for n=10000 take a new number - it would work. Though, concrete details are up to implementation.

ALX23z
  • 4,456
  • 1
  • 11
  • 18
1

One of the major accomplishments of <random> was the separation of distributions from engines.

I see it as similar to Alexander Stepanov's STL, which separated algorithms from containers through the use of iterators. For random numbers I can do an implementation of the Blum-Blum-Shub single bit generator (engine) and it will still work with all the distributions in <random>. Or, I can do a simple Linear Congruential Generator, x_{n + 1} = a * x_{n} % m, which when correctly seeded can never generate 0. Again, it will work with all the distributions. Likewise, I can write a new distribution and I don't have to worry about the peculiarities of any engine as long as I only use the interface specified by a UniformRandomBitGenerator.

In general, you should always use a distribution. Also, it is time to retire using '%' for generating random numbers.

user515430
  • 298
  • 1
  • 3
  • 7