2

I recently wrote some code that required a large number of draws from a gamma distribution. I implemented this using the standard gamma_distribution method, but am finding that occasionally (once in a blue moon) this returns a value of 1.#INF for no apparent reason.

Below is a minimal example that I find exhibits this problem. In my tests I have found that the problem usually occurs around the billionth iteration or so.

#include "stdafx.h"
#include <iostream>
#include <random>

std::random_device rd;
std::default_random_engine generator(rd());

int _tmain(int argc, _TCHAR* argv[])
{
    // create gamma distribution and random variable x
    std::gamma_distribution<double> rgamma(2.0,1.0);
    double x;

    // loop through a large number of iterations
    for (unsigned long int i=0; i<int(4e9); i++) {

        // print update to console every million iterations
        if ((i+1)%int(1e6)==0)
            std::cout << "iteration: " << (i+1)/1e6 << " million\n";

        // draw new value of x from gamma distribution
        x = rgamma(generator);

        // if x==infinity then break
        if ((1.0/x)==0) {
            std::cout << "Error at iteration " << (i+1) << ": x=" << x << "\n";
            std::cin.get();
            exit(1);
        }
    }
    // print message if reach end of loop
    std::cout << "end\n";
    std::cin.get();
    return 0;
}

I'd be very interested to know if others can replicate this problem. I don't know it this is relevant, but the program above was written as a Win32 application in Visual Studio 2010 and run on a Windows machine with 8 core Intel processor.

For the time being I've patched over this problem by just catching infinite values and making them large numbers instead. But if anyone has any insight into why/how this is occurring it would be greatly appreciated!

bobverity
  • 41
  • 4
  • As far as I can see, there is no rule that forbids infinity. I'll see if the probability matches. See what `rgamma.max()` returns. According to cplusplus "numeric_limits::max() or numeric_limits::infinity()". This is implementation defined, thus somebody else could not be able to reproduce it – WorldSEnder Jan 24 '16 at 16:34
  • I just ran your program (on 64-bit GNU/Linux, compiled with g++) for 2147 million iterations without it stopping at your `exit(1)`. It then printed `end` and exited normally (0). It's random, so maybe it could have happened... but I didn't witness it. – e0k Jan 24 '16 at 16:36
  • With your parameters, the gamma function becomes $xe^{-x}$ and integrating leads to $e^{-x}*(-x-1)$ which has to be evaluated at `numeric_limits::max()` to yield the probablity that your the return value is greater that that. Note that this is very system dependant, so you have to do it on your system – WorldSEnder Jan 24 '16 at 16:43
  • This is unrelated to your random number generator issue, but my run stopped at 2147 million iterations because the `int(4e9)` was clipped to the maximum for the signed type `int`---in my case 2^31-1 for a 32-bit signed integer. (An `int` is [not necessarily 32 bits](http://stackoverflow.com/questions/589575/what-does-the-c-standard-state-the-size-of-int-long-type-to-be).) I suggest caution with types when specifying large integers like that. – e0k Jan 24 '16 at 16:56
  • Thanks for the comments. @WorldSEnder, when I run rgamma.max() I get 1.79769e+308. Is this telling me the largest value that could be returned from the gamma_distribution? What's bugging me is that for a gamma(2.0,1.0) distribution the chance of even going above 50 is around 1e-22, so the chance of going over the limits of a double must be essentially zero. – bobverity Jan 24 '16 at 17:36
  • @e0k, thanks for the tip. I have never had to work with such large numbers before! I suppose I could use a while loop, or a couple of nested loops to crank up the iterations. – bobverity Jan 24 '16 at 17:38
  • You wouldn't necessarily need to write nested loops for that, just be sure to match the type with the variable you're comparing with, like `unsigned long(4e9)` instead of `int(4e9)` or just give the value `4000000000UL`. – e0k Jan 24 '16 at 19:26

1 Answers1

0

I made the random seed determinstic with:

//std::random_device rd;
std::default_random_engine generator(-246744094);

and can consistently reproduce this error at 832million iterations. I got it with different compilers (VS2013 32/64bit and also Intel C++ 2016 64-bit), and different machines (My desktop i7, and a cluster node Xeon E5) with identical result. My rgamma.max() agreed with Bob's too.

I then compiled it with MinGW (g++ main.cpp -o main.exe -std=c++0x -O3) - and it ran without generating any infinites.

SO - I hypothesise: it's some sort of bug in libraries that get linked through Visual Studio, regardless of whether you use the MS or Intel compiler. Not sure whether that can be reported/fixed - or whether it might just be quicker to find some reliable source code for gammas.