unbiased random number generator

Question

I am generating the random numbers in a domain by using the following code. When I plot them they look grouped at the right. I could show you my plot but I do not know how to upload it. Basically I associate a some data value to the respective point. May you tell me how can I correct it please? My complete code is

#include <iostream>
#include <cmath>
#include <fstream>
#include <sstream>
#include <string>
#include <cstdlib>
#include <cstdio>
#include <time.h>

using namespace std;
string int2string1( int l );
string int2string2( int m );
int main ()
{
    ofstream outFile;
    ofstream myimp;
    string filename;
    srand((unsigned)time(0));
    int nx = 400;
    int ny = 200;
    int i,j,ix,iy,xm,ym,nimp,nfam[nx][ny];
    float vo,rnd,rr,rad,sig,vimp[nx][ny];
    for (i=0; i<nx; i++)
    {
        for (j=0; j<ny; j++)
        {
            vimp[i][j] = 0.0;
        }
    }
    rad = 5.0;
    xm = 0;
    ym = 0;
    vo = 0.08;
    sig = 4.0;
    myimp.open("imp.dat");
    for(i=1; i<nx-1; i++)
    {
        for(j=1; j<ny-1; j++)
        {
            rnd = (random() %1000 + 1)*1.0/1000.0;
            if(rnd>0.99)
            {
                xm = random() % 398 + 1;              /***1 through 399 ***/
                ym = random() % 198 + 1;              /***1 through 199 ***/
                for(ix=xm-5; ix<=xm+5; ix++)
                {
                    for(iy=ym-5; iy<=ym+5; iy++)
                    {
                        rr = sqrt(pow(ix-xm,2.)+pow(iy-ym,2.));
                        if(rr<=rad)
                        {
                            vimp[ix][iy] = vo*1.6e-19;
                        }
                    }
                }
            }
            myimp<<i<<"\t\t"<<j<<"\t\t"<<xm<<"\t\t"<<ym<<"\t\t"<<nfam[i][j]<<"\t\t"<<vimp[i][j]*6.23e18<<"\n";
        }
    }
    myimp.close();
    return 0;
}

When I plot them the look grouped at the right. I could show you my plot but I do not know how to upload the plot. — nagendra, Nov 16 '11 at 19:27
For each pixel, there's a ~1/1000 chance you'll generate a 5-radius circle at a random position on the screen? Why the doubly-random? (Also, do you ever initialize `vimp`? — Mooing Duck, Nov 16 '11 at 19:32
Unless you have a lot of time to burn, I'd leave the random number generation to someone else, like ``. — Kerrek SB, Nov 16 '11 at 19:32
I do have to have more than 1000 accepted random numbers. Let me try. — nagendra, Nov 16 '11 at 20:01
You might want to look at [my answer](http://stackoverflow.com/questions/2129705/c-why-is-rand-anything-always-0/2129979#2129979) to an older question. — Jerry Coffin, Nov 16 '11 at 21:06

score 11 · Answer 1 · edited May 23 '17 at 10:34

11

int r = rand() % N;

Does not lead to a uniform distribution¹

Instead, I recommend just using C++ TR1 (or boost) random:

#include <random>

std::mt19937 rng(seed);
std::uniform_int_distribution<int> gen(0, N); // uniform, unbiased

int r = gen(rng);

Or to generate floating point numbers of any kind:

std::uniform_real_distribution<double> gen(-2*PI, +2*PI); // uniform, unbiased
double r = gen(rng);

_{¹ Backgound, e.g.: Using rand()}

If you're really stuck with using rand() and N which doesn't evenly divide MAX_RAND, the page has some hints on how to achieve somewhat better integer distributions using other formulae. Note that I'd steer you to André Caron's answer instead.

edited May 23 '17 at 10:34

Community

1
1

answered Nov 16 '11 at 19:32

sehe

374,641
47
450
633

`int r = rand() / ( RAND_MAX / N + 1 );` simply distributes the bias evenly. – Mooing Duck Nov 16 '11 at 19:38
1

Nice link to the "Using `rand()`" page! – André Caron Nov 16 '11 at 19:40
Don't you mean "in the range [M,N)"? – Robᵩ Nov 16 '11 at 19:45
@Rob: sorry, I wrote up the wrapper function after the rest of the answer, fixed – sehe Nov 16 '11 at 19:46
While rand() and random() are comparable in randomness these days, for best behavior in C on a wider range of systems, use random() instead of rand(). If you have access to a very recent C++ system, then the TR1/C++11 approach is best. – Randall Cook Nov 16 '11 at 20:02
Thanks Randall, however, I got the same distribution with random() too. – nagendra Nov 16 '11 at 20:24
2

Note, however, that all the wrappers you've posted here do/will introduce bias unless RAND_MAX is an exact multiple of the requested range (which is rare -- often impossible). We can only hope that those in the standard library do better. – Jerry Coffin Nov 16 '11 at 21:10
I don't believe the first mechanism gives you a uniform distribution. Fundamentally, you're trying to map `RAND_MAX` things down onto `N` things; that will never be uniform unless one is an exact divisor of the other. – Oliver Charlesworth Nov 16 '11 at 21:35
@OliCharlesworth: if you can quote a source, I'm happy to learn. I [quoted my source](http://eternallyconfuzzled.com/arts/jsw_art_rand.aspx) and it stands in good authority with me to date. I'm not saying I invented that myself :) – sehe Nov 16 '11 at 21:40
@sehe: I could look for a source, or I could just give you a simple example! Suppose `RAND_MAX = 4` and `N = 3`. You're basically defining a function `f(x)` that maps the output of `rand` to the desired range. Suppose that `f(0) = 0, f(1) = 1, f(2) = 2`. What would `f(3)` need to be to give a uniform distribution? The only way to achieve uniformity is with rejection sampling. – Oliver Charlesworth Nov 16 '11 at 21:42
@OliCharlesworth: mmm perhaps it would need to get suppressed/skipped :) You see, I'm no expert. Let me indulge you and promote the C++ solution to the front. Thanks for sharpening my saw _even further_ – sehe Nov 16 '11 at 21:48
@sehe: Yes, if you remove the division method from your answer, I'll be happy to turn a -1 into a +1! – Oliver Charlesworth Nov 16 '11 at 21:49
let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/5077/discussion-between-sehe-and-oli-charlesworth) – sehe Nov 16 '11 at 22:06
I don't think that it's possible in general for a random floating-point number to be both unbiased and uniformly distributed - these are contradictory goals because floating point numbers are not themselves uniformly distributed. – caf Nov 17 '11 at 00:15

André Caron · Accepted Answer · 2011-11-21T17:04:18.917

7

Basically, the rand() % N expression introduces a bias if RAND_MAX is not a multiple of N. It projects numbers in [0,RAND_MAX] onto the range [0,N] with a non-uniform fashion.

Suppose that RAND_MAX=4 and N=2. Then, there are 3 numbers that produce 0 (0, 2 and 4) and 2 numbers that produce 1 (1 and 3). Thus, you have 60% change of getting 0 and 40% chance of getting 1.

The correct way to implement an unbiased projection from [0,RAND_MAX] onto [0,N] is by invoking rand() repeatedly until the random value is in the desired interval. See the documentation for Random.nextInt() in Java (Credits to Oli Charlesworth for the link).

Assuming that, for sheer execution speed, you want to avoid calling rand() multiple times, the way to generate the least bias possible is to use an intermediate double number, such as:

double myrand ()
{
    return double(rand()) / double(RAND_MAX);
}

int myrand ( int max )
{
    return int(myrand() * double(max));
}

Edit: Here's a simple class that will project outputs of the rand() function in to a range [0,N] with no less bias than rand().

class BoundedRandom
{
    const int m_maximum;
    const int m_divisor;
public:
    BoundedRandom ( int maximum )
        : m_maximum(maximum),
          m_divisor(RAND_MAX/(maximum+1))
    {}

    int operator() ()
    {
        int result = rand() / m_divisor;
        while ( result > m_maximum ) {
            result = rand() / m_divisor;
        }
        return (result);
    }
};

Caution: not tested or debugged.

You can use this generator like this:

BoundedRandom randomx(398);
BoundedRandom randomy(198);
// ...
 xm = randomx() + 1; // 1 through 399
 ym = randomy() + 1; // 1 through 199

edited Nov 21 '11 at 17:04

answered Nov 16 '11 at 19:33

André Caron

44,541
12
67
125

2

"Suppose that RAND_MAX=4 and N=2." They aren't. When you plug in the real values the bias you talk of is negligible. – David Heffernan Nov 16 '11 at 19:36
3

@DavidHeffernan: when `RAND_MAX` and `N` are larger, the bias is definitely *smaller*. Whether or not this is *negligible* depends on the specific values of `RAND_MAX`, `N` and the application. – André Caron Nov 16 '11 at 20:55
This won't eliminate the bias either. – Oliver Charlesworth Nov 16 '11 at 21:38
@OliCharlesworth: Care to elaborate? There is still bias in the fact that `RAND_MAX` slots cannot be projected onto `N` slots without introducing *some* bias if (`RAND_MAX%N!=0`). However, this is probably the only fair way to project the results from `rand()` onto `N` slots. – André Caron Nov 16 '11 at 22:35
Hmmm, just followed the discussion in chat. I'll update my answer. – André Caron Nov 16 '11 at 22:39
1

@AndréCaron: Ok, -1 removed! BTW, it's possible to do better than calling rand() until it falls into the output range. For instance, if your desired range is 0->9 (inclusive), then you can sample rand() until you get a number in the range 0->32759 (inclusive), and then divide or modulo. – Oliver Charlesworth Nov 16 '11 at 23:07
I see. Indeed, using `rand()` until you fall into a range that's the largest multiple of `N` that precedes `RAND_MAX` is likely to get much better performance. Sounds cool, thanks for the hint! – André Caron Nov 16 '11 at 23:30
Dear Andre, may you tell me how I can apply this concept in my case please? My range is 0 - 299 and 0 - 399. Thanks. – nagendra Nov 21 '11 at 15:42
@Nagendra: Added an example to show how it would be implemented. Please test the code thoroughly to make sure it works before using it. – André Caron Nov 21 '11 at 17:04
Thanks all. Basically my conditions for the selecting the xm and ym made the trouble. I could fix it. Moreover, BoundedRandom class works well. – nagendra Dec 24 '11 at 21:09

score 3 · Answer 3 · answered Nov 16 '11 at 19:48

C++11 introduced random number generators that will almost certainly do a better job than rand. Here's an example using the mersenne twister algorithm, but there are others to choose from depending on the characteristics you need.

// [1, 399]
auto random_int = std::bind(std::uniform_int_distribution<int>(1,399),std::mt19937()); 

// [0, 1.0)
auto random_double = std::bind(std::uniform_real_distribution<double>(0.0,1.0),std::mt19937());

score 2 · Answer 4 · answered Nov 16 '11 at 19:26

2

rand() % 398 + 1; /*** 1 through 399 ***/ will generate numbers 1 through 398, because `rand() % 398 will be 0-397 (398 % 398 is 0). Same for the next line.

As an aside, note that using pow with a constant power of two can cost an order of magnitude more CPU than simply writing out the multiplication and should usually be avoided.

Also since you only use rnd in a single comparison against the constant 0.99 you should just work that instead as integer math, as the conversion and comparison in floating point will cost rather more than just doing an integer comparison. For example instead the two lines (rnd = and if) use if((rand() % 100) == 0) which while slightly biased should accurately indicate your intention.

answered Nov 16 '11 at 19:26

Mark B

95,107
10
109
188

@sehe No more biased than the original code, more performant, and I did indicate that it has a slight bias. Almost certainly the real problem is that 399 and 199 can never be generated. – Mark B Nov 16 '11 at 19:33
You are right. My point is each no has to be independent, am I right? My numbers look densely grouped at right than at the left. Each time I run the code I get different distribution but the trend stays the same. – nagendra Nov 16 '11 at 19:36
Yeah, granted, but the question is about the bias. We'll let the OP decide whether the real problem is indeed with those extreme values (not so sure. I don't see how that fits the complaint ('look grouped to the right') though – sehe Nov 16 '11 at 19:39
I intend to show the distribution that I get over and over again. Is there any way to present that to you please so that it would be more explicit what I am saying. – nagendra Nov 16 '11 at 20:20
Note that GCC (and probably others) can optimise `pow(x,2)`. – Oliver Charlesworth Nov 16 '11 at 22:28

score 0 · Answer 5 · answered Nov 16 '11 at 19:38

0

I've read through your code and cannot find anything to bias it to the right. If anything, there's a statistical bias to the left (the left two thirds should be favored by 0.6% ish).

answered Nov 16 '11 at 19:38

Mooing Duck

64,318
19
100
158

unbiased random number generator

5 Answers5

Linked