I'm running the following program:
#include <iostream>
#include <vector>
#include <cmath>
#include <cstdlib>
#include <chrono>
using namespace std;
const int N = 200; // Number of tests.
const int M = 2000000; // Number of pseudo-random values generated per test.
const int VALS = 2; // Number of possible values (values from 0 to VALS-1).
const int ESP = M / VALS; // Expected number of appearances of each value per test.
int main() {
for (int i = 0; i < N; ++i) {
unsigned seed = chrono::system_clock::now().time_since_epoch().count();
srand(seed);
vector<int> hist(VALS, 0);
for (int j = 0; j < M; ++j) ++hist[rand() % VALS];
int Y = 0;
for (int j = 0; j < VALS; ++j) Y += abs(hist[j] - ESP);
cout << Y << endl;
}
}
This program performs N tests. In each test we generate M numbers between 0 and VALS-1 while we keep counting their appearances in a histogram. Finally, we accumulate in Y the errors, which correspond to the difference between each value of the histogram and the expected value. Since the numbers are generated randomly, each of them would ideally appear M/VALS times per test.
After running my program I analysed the resulting data (i.e., the 200 values of Y) and I realised that some things where happening which I can not explain. I saw that, if the program is compiled with vc++ and given some N and VALS (N = 200 and VALS = 2 in this case), we get different data patterns for different values of M. For some tests the resulting data follows a normal distribution, and for some tests it doesn't. Moreover, this type of results seem to altern as M (the number of pseudo-random values generated in each test) increases:
- M = 10K, data is not normal:
- M = 100K, data is normal:
- and so on:
As you can see, depending on the value of M the resulting data follows a normal distribution or otherwise follows a non-normal distribution (bimodal, dog food or kind of uniform) in which more extreme values of Y have greater presence.
This diversity of results doesn't occur if we compile the program with other C++ compilers (gcc and clang). In this case, it looks like we always obtain a half-normal distribution of Y values:
What are your thoughts on this? What is the explanation?
I carried out the tests through this online compiler: http://rextester.com/l/cpp_online_compiler_visual