0

In the following code, in which I expect a die that roles bilions of times that the average outcome to be exactly 3.5, the percentage that lies above 3.5 sometimes is like 5 percent and other times (with different seed of course) is like 95. But even when you go as high as 6040M thows, you never end up near 50% above, 50% under 3.5? Obviously there's a little bias in rand()...

I know about the fact that 'real random' doesn't exist but is it really this obvious?

Typical outputs are:

Average: 3.50003 counter: 3427000000 Percentage above: 83.2554 Perc abs above counter: 50.0011
Average: 3.49999 counter: 1093000000 Percentage above: 92.6983 Perc abs above counter: 50.0003

#include <stdio.h>      /* printf, scanf, puts, NULL */
#include <stdlib.h>     /* srand, rand */
#include <time.h>       /* time */
#include <unistd.h>
#include <iostream>
using namespace std;

int main ()
{
  long long int this_nr;
  long long int counter = 0;
  long long int above_counter = 0;
  long long int below_counter = 0;
  long long int above_counter_this = 0;
  long long int below_counter_this = 0;

  long long int interval_counter = 0;

  double avg = 0.0;
  srand (time(NULL));
  srand (time(NULL));
  srand (time(NULL));
  cout.precision(6);

  while(1) {
      this_nr = rand() % 6 + 1; // 0,1,2,3,4,5 or 6
      avg = ((double) this_nr + ((double)counter * (double) avg))
          / ((double) counter+1.0);
      if (this_nr <= 3) below_counter_this++;
      if (this_nr >= 4) above_counter_this++;
      if (avg < 3.5) below_counter++;
      if (avg > 3.5) above_counter++;
      if (interval_counter >= 1000000) {
        cout << "Average: " << avg << " counter: " << counter << " Percentage above: "
                 << (double) above_counter / (double) counter * 100.0
                 << " Perc abs above counter: " << 100.0 * above_counter_this / counter
                 << "                 \r";
        interval_counter = 0;
      }
      //usleep(1);
      counter++; 
      interval_counter++;
  }
}
Deduplicator
  • 44,692
  • 7
  • 66
  • 118
Niels
  • 537
  • 5
  • 22
  • Note the difference between the average of all throws and the percentage of throws above 3.5. I'm concerned about the average, that never ends up around 50% – Niels Apr 09 '15 at 15:09
  • 1
    `this_nr = rand() % 6 + 1; // 0,1,2,3,4,5 or 6` does not correctly describe the output. With that `+ 1` in your code, the output `0` is not possible. Your possible outputs are: 1, 2, 3, 4, 5, 6 only. – rossum Apr 09 '15 at 15:16
  • You are very right. The comment is the wrong part, though. My expectation indeed being 1,2,3,4,5, or 6. Thanks! – Niels Apr 09 '15 at 15:49
  • BTW, you do realize that multiple srand()s are pointless? At least they're correctly outside the loop--we have to fix that bug at least once a week. Also, you don't seem to be zeroing all the stats when you zero interval_counter. For example, above_counter and below_counter seem to keep accumulating, so your trials aren't independent. – Lee Daniel Crocker Apr 09 '15 at 16:04
  • Independent trials... how could I have forgotten :) – Niels Apr 09 '15 at 16:42
  • But no. The interval_timer is only for just once in a while showing how we look. The weirdness I find is in the fact that over time, when you keep running this, the above_counter / total_counter is moving up and down in both C++ and PHP rand generators (from 2-99 % in minutes) – Niels Apr 09 '15 at 16:46
  • C++ output after 5-10 minutes: Average: 3.499811786963 counter: 12762133 Counter above perc: 3.8301199337133 Counter above abs perc 49.9866989319119 – Niels Apr 09 '15 at 16:47
  • PHP output after 5-10 minutes: Average: 3.49984 counter: 13764282 Percentage above: 25.6479 Perc abs above counter: 49.9916 – Niels Apr 09 '15 at 16:47
  • For roling dice it's all fine: output goes just like this for 1-6: – Niels Apr 09 '15 at 18:36
  • 2004679 2005234 2004875 2003993 2003967 2005338 after a few minutes, in c++ – Niels Apr 09 '15 at 18:37

1 Answers1

1

rand() is well known to be a terrible generator, and it's particularly bad in the low bits. Performing % 6 is picking off only the low bits. There's also a chance that you're experiencing some modulo bias, but I'd expect that effect to be relatively minor.

Community
  • 1
  • 1
pjs
  • 18,696
  • 4
  • 27
  • 56
  • Thank for your input. Do you or does anyone have a better suggestion? In C, preferably – Niels Apr 09 '15 at 15:29
  • When I put the seed inside the loop, the bias gets bigger but nothing more like 50-50% on progressing average – Niels Apr 09 '15 at 15:30
  • 1
    Don't ever keep reseeding a generator! Seed once, then use. – pjs Apr 09 '15 at 15:32
  • For better generators, see if `random` is available on your system. It's somewhat better, but not great. If you're on OS X you can use `arc4random`, which is quite good. Another alternative is to find and download a C implementation of Mersenne Twister. – pjs Apr 09 '15 at 15:36
  • Alright, so the best answer here is the quality of the generator? Thanks.. I tried this in PHP and seem to get lesser bias btw... – Niels Apr 09 '15 at 15:52
  • The answers of me in the comments above seem to contradict this. Maybe I dont really understand the word 'random' :) – Niels Apr 09 '15 at 17:03
  • Try a good generator like rc4 or Mersenne twister and see if your results are different. – Lee Daniel Crocker Apr 09 '15 at 17:44
  • Marsenne twister seems to be the one used in PHP by standard, according to wiki: marsenne twister... right? – Niels Apr 09 '15 at 17:46
  • Can't say, since I don't do PHP. Note that the name is "Mersenne" with an "e". – pjs Apr 09 '15 at 18:04