-1

So, I have made a program that simulates things and in it I noticed that the c++ function rand() seemed to generate low numbers too often, so I tried to test it.

#include <iostream>
#include <fstream>
#include <stdio.h>
#include <vector>
#include <cstdlib>
#include <time.h>
#include <cfloat>
#include <iomanip> 

using namespace std;

int main(){
    srand(time(NULL));
   
    int qwerty=0;
    for(int i=0; i<10000000;i++){
        if(rand()%10000<2800){
            qwerty++;
        }
    }
    cout << qwerty << endl;
    return 0;
}

If I ran the file with this "for tester" in it I would get consistently a number near 3400000, or 34%, which is near to the 34% I had seen appear inside my real program, the problem is that the output should be near 2800000 or 28%.

I then tried to run this "for tester" on a new project(the same I wrote here) where only the libraries and the srand(time(NULL)) were present, same output.

I then tried to copy this file inside an online compiler, this time instead of 3400000 I got the correct number 2800000.

I can't find why this is happening, anyone who knows?

Additional info: I am using dev-c++ as a IDE with the TDM-GCC 4.9.2 64bit release and the ISO C++11, If I take the executable generated by my computer and run it in another one I get the same 34% result, Windows 10 is the operating system. This problem happens also if I use different numbers.

  • 1
    This is how `rand()` behaves. The intervals it works in, and the interval you want are not divisible, so the last bucket that `rand()` can use isn't full. In other words, there are more low values for it to choose from, so it follows that they appear more often. Update all of your tooling, and use something like `std::mt19937` from ``. – sweenish Mar 17 '22 at 16:11
  • 5
    `rand` and modulo are really bad for giving good distributions. If you can use C++11, use a `std::mt19337` for the random number generator and a `std::uniform_int_distribution` to get the values in the range you need. – NathanOliver Mar 17 '22 at 16:12
  • 2
    What is `RAND_MAX` on your platform? – Bathsheba Mar 17 '22 at 16:14
  • Side note: [Here's a link to a more up-to-date version of DevCpp](https://github.com/Embarcadero/Dev-Cpp/releases). You probably want to grab the version that comes bundled with the GCC9.2 toolchain, but if you want to easily keep up with the evolving GCC compiler (and get a large ecosystem of prebuilt tools) consider using one of the downloads without GCC and instead [install MSYS2 and use it to install your toolchain](https://stackoverflow.com/a/30071634/4581301). – user4581301 Mar 17 '22 at 16:17
  • My take is to just avoid dev-c++ altogether. – sweenish Mar 17 '22 at 16:18
  • 1
    [Excellent presentation on `rand`, why you shouldn't use it anymore, and how to use ``](https://www.youtube.com/watch?v=LDPMpc-ENqY). – user4581301 Mar 17 '22 at 16:21
  • @user4581301: Except is not the generator issue here - a true generator would show a similar effect. Even experienced scientists abuse computer-generated random sequences. – Bathsheba Mar 17 '22 at 16:24
  • @Downvoters, this question is deeper than it looks on first inspection. – Bathsheba Mar 17 '22 at 16:25
  • But not much. `rand()` and modulo are the old-school PB&J of generating random numbers in a range. I don't believe you can shove `rand()` into any of the modern distributions. `rand()` in a vacuum may be perfectly acceptable, but it's naive to act like that's how it's used. My opinion is that it's rightfully earned it's bad rep, and we should just move on to the newer tools or a third-party library depending on your needs. – sweenish Mar 17 '22 at 16:39
  • 1
    @Bathsheba I had checked but I forgot to include it, it's 32767 – Praisethefab Mar 17 '22 at 16:44
  • @sweenish I used rand because it's what I had been taught both in highschool (for C) and in university right now (imperative C++), I'll be sure to not use it from now on. – Praisethefab Mar 17 '22 at 16:46
  • Thanks to everyone for answering – Praisethefab Mar 17 '22 at 16:50

2 Answers2

4

For a uniformly distributed random variable E in the open interval [0, 32767] the probability of mod(E, 10000) < 2800 is around 34%. Intuitively you can think of mod(E, 10000) < 2800 as favouring the bucket of numbers in the range [30000, 32767]: that bucket modulo 10000 is always less than 2800. So that has the effect of pushing the result above 28%.

That's the behavior you are observing here.

It's not a function of the quality of the random generator, although you would get better results if you were to use a uniform generator with a larger periodicity. Using rand() out of your C++ standard library is ill-advised as the standard is too relaxed about the function requirements for it to be portable. <random> from C++11 will cause you far less trouble: you'd be able to avoid explicit % too.

Bathsheba
  • 231,907
  • 34
  • 361
  • 483
  • But it's still a very good reason to avoid `rand()` + `%`, generally. – sweenish Mar 17 '22 at 16:25
  • @Sweenish It's a good reason to treat `%` with good care with any random generator. – Bathsheba Mar 17 '22 at 16:26
  • If you want to generalize, sure. But if someone is using modulo with a Mersenne twister, that's a lot easier to call out in review. But I subscribe to the '`rand()` considered harmful' philosophy. We have better tools, just use them. – sweenish Mar 17 '22 at 16:27
  • 1
    @sweenish: The main issue with `rand()` aside from not being thread safe is that the C++ standard is remarkably flexible on its requirements. Which means that in any mathematical software you end up rolling your own version. It can be very useful though - it requires less state than MT, and is very very fast on modern platforms. Enhancements like Bays-Durham shuffling can help it attain enough statistical merit to pass even the Diehard tests. – Bathsheba Mar 17 '22 at 16:30
  • Those sound like serious issues to me. Which only reinforces my point, in my mind. If we're going to go the roll-your-own way, that's fine too. I am aware of the shortcomings of the Standard Library's PRNGs. But this just makes it sound like `rand()` shouldn't be used, ever. If I have to care enough to learn about the implementation, I would be better served with a third-party library. – sweenish Mar 17 '22 at 16:33
3

Thats a well known issue with % and a rare case where its not rands fault.

For the sake of the example consider RAND_MAX == 2. Further assume rand() is perfectly uniform. Then you get numbers 0,1 and 2. Now look at this:

 int x = rand() % 2;

If the distribution of rand is

rand()   P
0       1/3 33.33333 %
1       1/3 33.33333 %
2       1/3 33.33333 %

Then the resulting distribution of x is:

x       P
0       2/3 66.66666 % 
1       1/3 33.33333 %

Solution: Use the facilities provided in <random>.

Alan Birtles
  • 32,622
  • 4
  • 31
  • 60
463035818_is_not_an_ai
  • 109,796
  • 11
  • 89
  • 185