5

In the event where we need to generate probability, for example a bias coin with 75% of tossing head and 25% tossing tail. Conventionally, I will do it this way:

#include <cstdlib>
#include <iostream>
#include <ctime>
using namespace std;

int main()
{
    int heads=0, tails=0;
    srand(time(NULL));
    number = rand() % 100 + 1;  //Generate random number 1 to 100
          if (number <= 75) //75% chance
                heads++; //This is head
          else
                tails++; //This is tail
}

This is a working code, however when I answered a similar question on bias coin in SO for another user, some of the users mentioned about the multiple of 100. Since the random function generates uniform distribution, I feels that the above code is good enough for simulating a probability event.

In this past SO posts, user Bathsheba mentioned somehting about multiples of 100: Program that simulates a coin toss with a bias coin I wanted to know what are the possible problems in my code in relation to that.

My question is: Is the above code an acceptable code to create a simulation with probability? Or are there any flaws in these codes which would affect the accuracy of the simulated results. If the above codes has flaws, what would be the correct way of implementing simulation for probability?

Edit: With a simulated test of 10,000,000 toss. It always generates at probability of approximately 75.01%-75.07% chance for tossing head. So what problems could there be when it is generating an seemingly accurate result. (The generated results didn't seemed to be skewed)

Community
  • 1
  • 1
user3437460
  • 17,253
  • 15
  • 58
  • 106
  • 2
    "Since the random function generates normal distribution" - no it doesn't. It generates a uniform distribution... – Mitch Wheat Apr 14 '14 at 08:41
  • 2
    The `%` operator skews the distribution. – The Paramagnetic Croissant Apr 14 '14 at 08:41
  • @MitchWheat Yes, what I meant was uniform distribution, I will edit on that. thanks :-) – user3437460 Apr 14 '14 at 08:42
  • Bethsheba's answer to the question you linked to adequately explains why the use of % is frowned upon. – Mitch Wheat Apr 14 '14 at 08:43
  • rand() in combination with modulus have known defects, don't use for anything important, such as a proof. – sp2danny Apr 14 '14 at 08:44
  • If you want higher quality random numbers, take look at this question: http://stackoverflow.com/questions/9471604/what-is-the-best-way-to-generate-random-numbers-in-c – merlin2011 Apr 14 '14 at 08:57
  • Even though that's not what you were focusing on (but you asked if the code is acceptable): For the simulations that I usually do, initialization with a time-based seed (which is not stored) is unacceptable because it renders the simulation irreproducible. If you later discover a bug in your code, you can't reproduce with the same seed. So better have a predetermined seed or store the seed. – Philipp Apr 14 '14 at 08:58
  • If you want a rigorous random function, you will need to use C++11 or a third-party library. – Neil Kirk Apr 14 '14 at 09:27
  • Having RAND_MAX = 2147483647 you get a single range 47 of biased results, which is tiny compared to 21474836 non biased ranges (but it shows in your 10,000,000 toss) –  Apr 14 '14 at 09:29
  • @user3437460 Your code would works well, if numbers between 1 and 100 all have P=1/100 – Khaled.K Apr 14 '14 at 10:49

6 Answers6

2

Is the above code an acceptable code to create a simulation with probability? Or are there any flaws in these codes which would affect the accuracy of the simulated results?

If this is "acceptable" this depends on what is your definition of acceptable. This is not correct for sure as operator % makes your probability skewed because RAND_MAX which is maximum value for rand() can be not equal to k * 100 + 99 which results in that if you imagine your 100-length parts of 0-RAND_MAX string then you can see that last part will probably not produce a full range 0-99, so you have more numbers that generates 0, 1, 2..., x but not necessary x + 1, ..., 98, 99 ( 1 more occurrence for each number in 0, 1, 2, ..., x). The inaccuracy of this approach increases with bigger divisor which doesn't divide the range evenly.

If the above codes has flaws, what would be the correct way of implementing simulation for probability?

You can use boost or if you can run C++11 then you can use standard library's uniform_int_distribution.

4pie0
  • 29,204
  • 9
  • 82
  • 118
  • Just adding to this, `(MAX_INT % 100)` isn't zero, (I'm not sure how well the distribution is for `rand()` but given that it is a uniform distribution for 0 to MAX_INT, all you need to do is use `rand() % 4` and check for less that 4 (you want to mod by a power of 2). – SGM1 Apr 14 '14 at 17:21
1

Due to the limited nature of numbers you always will get a biased result (increasing the number of results of a random number generator would increase accuracy)

In your sample you may have a better definition of what 75% is:

int main()
{
    int heads=0, tails=0;
    srand(time(NULL));
    const std::size_t Samples = 10000000;
    for(std::size_t i = 0; i < Samples; ++i) {
        int head_limit = RAND_MAX * 0.75;
        int number = rand();
        if (number <= head_limit) heads++;
        else tails++;
    }
    // heads: 7498728 [0.749873%]
    // tails: 2501272 [0.250127%]
    std::cout 
        << "heads: " << heads << " [" << double(heads) / Samples << "%]\n"
        << "tails: " << tails << " [" << double(tails) / Samples << "%]\n";
}
0

using rand() % 100 + 1 doesn't work in a way like "while generating 100 random numbers - exactly 75 numbers will be less than 75"

in other way - it doesn't provide guarantee that in 100 randomly generated numbers,75 numbers will be less than 75!

Sanket Patel
  • 366
  • 2
  • 9
0

Is the above code an acceptable code to create a simulation with probability? Or are there any flaws in these codes which would affect the accuracy of the simulated results.

I don't know your definition of "acceptable". However, I would avoid using rand() altogether, see for example rand() Considered Harmful.

If the above codes has flaws, what would be the correct way of implementing simulation for probability?

I would use std::bernoulli_distribution with a Mersenne Twister engine. It's high quality, faster (according to Stephan T. Lavavej's presentation) and standard.

By the way, the example code at std::bernoulli_distribution gives "true" 1/4 of the time and "false" 3/4 of the time. ;)

Ali
  • 56,466
  • 29
  • 168
  • 265
0

std::rand() generates a number between 0 and RAND_MAX, which is guaranteed to be at least 32767.

Suppose RAND_MAX is defined to be 32767, will rand()%100 produce a flat distribution? No. From 0 to 32699, each value from 0 to 99 will appear 327 times. But from 32700 to 32767, the values 0 to 67 appear once, and 68-99 appear 0 times. So your distribution has 328 occurrences of 00-67, and 327 occurrences of 68-99.

Further, unless you somehow specify what RAND_MAX is, or use it in your code, you will be at the mercy of whatever the compiler implementation uses for RAND_MAX, and your distribution will be skewed in some unknown manner.

If you want a coin to come up heads three-quarters of the time, consider something like this:

if((double)std::rand()/3.0 > (double)RAND_MAX/4.0)

(if a > 3/4 * b then a/3 > b/4). This will be nearly fair; it is unlikely that RAND_MAX will divide neatly into 4. But it will be better than the 1/327 bias in the original code.

But even better, use a better random number generator where you can set the limits.

Phil H
  • 19,928
  • 7
  • 68
  • 105
0

In order to make sure that each number between 1 and 100 or 0 and 99 have a probability P=1/100 to make sure you have exact probabilities sorted,

Then instead of using a randomly generated numbers, is to use a list of 1,000 1-100 distributed evenly, then every time you need to re-use them, you shuffle them using the same random number generator,

So first we build the list:

const int SIZE = 1000;
srand(time(NULL));
int randList [SIZE];

Then we fill it:

void init (int randList[], const int SIZE)
{
    for (int i=0; i<SIZE; i++)
        randList[i] = i % 100;
}

Then before every 1000 trials on the coin, we shuffle the list:

void shuffle (int randList[], const int SIZE)
{
    for (int i=0; i<SIZE; i++)
        swap(randList,i,(rand() % SIZE));
}

void swap (int randList[], int a, int b)
{
    int t = randList[a];
    randList[a] = randList[b];
    randList[b] = t;
}

Then we can do trials like this:

bool trial (int randList[], const int SIZE, int trialCount)
{
    return (randList[trialCount % SIZE] < 75); // Head = True = 75%
}

Then a set of trials:

void test (bool * resultList, const int resultSize)
{
    const int SIZE = 1000;
    srand(time(NULL));
    int randList [SIZE];

    init(randList,SIZE);

    for (int i=0; i<resultSize; i++)
    {
        if (i%SIZE == 0)
            shuffle(randList,SIZE);

        resultList[i] = trial(randList,SIZE,i);
    }
}

Finally, in main function we use test function directly:

int main ()
{
    const int resultSize = 2000000; // 2 Million

    bool * resultList = new bool[resultSize];

    test(resultList,resultSize);

    // check sequence of outcomes

    return 0;
}
Khaled.K
  • 5,828
  • 1
  • 33
  • 51
  • You get a nice stack overflow using 10,000,000 samples (At leat, use dynamic memory) –  Apr 14 '14 at 11:31