Hi I've noticed some strange behaviour in the Apache Maths library (version 2.2) specifically in the org.apache.commons.math.distribution.GammaDistributionImpl
class although I think this will probably apply to other distributions as well.
I wanted to take samples from different gamma distributions as follows:
public static final double[] gammaSamples(final double[] shapeParameters)
{
double[] samples = new double[shapeParameters.length];
for (int i = 0; i < shapeParameters.length; i++)
{
GammaDistributionImpl gd = new GammaDistributionImpl(shapeParameters[i], 1.0d);
try
{
samples[i] = gd.sample();
}
catch (MathException e)
{
e.printStackTrace();
}
}
return samples;
}
However when running the code I find the samples are all suspiciously similar i.e. given
public static void main(String[] args)
{
System.out.println(Arrays.toString(gammaSamples(new double[] { 2.0d, 2.0d, 2.0d})));
}
Some example outputs are:
[0.8732612631078758, 0.860967116242789, 0.8676088095186796]
[0.6099133517568643, 0.5960661621756747, 0.5960661621756747]
[2.1266766239021364, 2.209383544840242, 2.209383544840242]
[0.4292184700011395, 0.42083613304362544, 0.42083613304362544]
I think the problem is due to the default random number generator using the same/similar seeds for each distribution, I tested this as follows:
public static final double[] gammaSamples(final double[] shapeParameters, final Random random)
{
double[] samples = new double[shapeParameters.length];
for (int i = 0; i < shapeParameters.length; i++)
{
GammaDistributionImpl gd = new GammaDistributionImpl(shapeParameters[i], 1.0d);
gd.reseedRandomGenerator(random.nextLong());
try
{
samples[i] = gd.sample();
}
catch (MathException e)
{
e.printStackTrace();
}
}
return samples;
}
This seems to fix the problem i.e. given
public static void main(String[] args)
{
System.out.println(Arrays.toString(gammaSamples(new double[] { 2.0d, 2.0d, 2.0d }, new Random())));
}
Some example outputs are:
[2.7506981228470084, 0.49600951917542335, 6.841476090550152]
[1.7571444623500108, 1.941865982739116, 0.2611420777612158]
[6.043421570871683, 0.8852269293415297, 0.6921033738466775]
[1.3859078943455487, 0.8515111736461752, 3.690127105402944]
My question is:
What's going on? Is this a bug or was it intended for the Apache Maths distributions to act this way?
It seems strange to me that if I create separate distribution objects I have to worry what seeds they are being given and make sure that they are sufficiently different.
Another slight annoyance is that I can't seem to pass these distributions my own Random object rather they only allow the seed to be changed through the reseedRandomGenerator(long seed) method. Being able to pass them my own Random object would be quite useful when trying to reproduce results.
Thanks for any help.