3

Hi I've noticed some strange behaviour in the Apache Maths library (version 2.2) specifically in the org.apache.commons.math.distribution.GammaDistributionImpl class although I think this will probably apply to other distributions as well.

I wanted to take samples from different gamma distributions as follows:

public static final double[] gammaSamples(final double[] shapeParameters)
{
    double[] samples = new double[shapeParameters.length];
    for (int i = 0; i < shapeParameters.length; i++)
    {
        GammaDistributionImpl gd = new GammaDistributionImpl(shapeParameters[i], 1.0d);
        try
        {
            samples[i] = gd.sample();
        }
        catch (MathException e)
        {
            e.printStackTrace();
        }
    }
    return samples;
}

However when running the code I find the samples are all suspiciously similar i.e. given

public static void main(String[] args)
{
    System.out.println(Arrays.toString(gammaSamples(new double[] { 2.0d, 2.0d, 2.0d})));
}

Some example outputs are:

[0.8732612631078758, 0.860967116242789, 0.8676088095186796]
[0.6099133517568643, 0.5960661621756747, 0.5960661621756747]
[2.1266766239021364, 2.209383544840242, 2.209383544840242]
[0.4292184700011395, 0.42083613304362544, 0.42083613304362544]

I think the problem is due to the default random number generator using the same/similar seeds for each distribution, I tested this as follows:

public static final double[] gammaSamples(final double[] shapeParameters, final Random random)
{
    double[] samples = new double[shapeParameters.length];
    for (int i = 0; i < shapeParameters.length; i++)
    {
        GammaDistributionImpl gd = new GammaDistributionImpl(shapeParameters[i], 1.0d);
        gd.reseedRandomGenerator(random.nextLong());
        try
        {
            samples[i] = gd.sample();
        }
        catch (MathException e)
        {
            e.printStackTrace();
        }
    }
    return samples;
}

This seems to fix the problem i.e. given

public static void main(String[] args)
{
    System.out.println(Arrays.toString(gammaSamples(new double[] { 2.0d, 2.0d, 2.0d }, new Random())));
}

Some example outputs are:

[2.7506981228470084, 0.49600951917542335, 6.841476090550152]
[1.7571444623500108, 1.941865982739116, 0.2611420777612158]
[6.043421570871683, 0.8852269293415297, 0.6921033738466775]
[1.3859078943455487, 0.8515111736461752, 3.690127105402944]

My question is:

What's going on? Is this a bug or was it intended for the Apache Maths distributions to act this way?

It seems strange to me that if I create separate distribution objects I have to worry what seeds they are being given and make sure that they are sufficiently different.

Another slight annoyance is that I can't seem to pass these distributions my own Random object rather they only allow the seed to be changed through the reseedRandomGenerator(long seed) method. Being able to pass them my own Random object would be quite useful when trying to reproduce results.

Thanks for any help.

Richy
  • 100
  • 1
  • 7

1 Answers1

2

By looking at the javadoc :

I saw that there is the method public double[] sample(int sampleSize) throws MathException

Generates a random sample from the distribution. The default implementation generates the sample by calling sample() in a loop.

Have you tried it ?

double[] samples = sample(shapeParameters.length);

Edit: I apologize, I saw that you calculate a new GammaDistributionImpl each time with a new alpha parameter. I guess this come from the fact that seed values are derived from the system clocks with a finite resolution, and close calls to constructor will produce identical results. Have a look at this SO question.

Here are some inputs to let you make a more in-depth investigation :

Community
  • 1
  • 1
zoom
  • 1,686
  • 2
  • 16
  • 27
  • Thanks I think you're right about this being due to seed values derived from the system clocks. I also just checked out version 3.1 of the library and they have an extra constructor `GammaDistribution(RandomGenerator rng, double shape, double scale, double inverseCumAccuracy)` which is sort of what I was looking for (to be able to pass my own random number generator). I feel like there should be a warning somewhere to not create "random" objects in a loop since they will probably be given the same seeds given how fast modern systems are. – Richy Jan 26 '13 at 23:32
  • It just *feels* like a seed problem, doesn't it! – Lucas Jan 27 '13 at 03:47
  • Well it doesn't just feel like a seed problem it almost definitely is a seed problem. Look at the code I posted, the original `gammaSamples` method doesn't work and is using default seeds, the modified `gammaSamples` method sets its own seeds but is otherwise equivalent and fixes the problem. – Richy Jan 27 '13 at 11:28