24

I'm trying to use boost::normal_distribution in order to generate a normal distribution with mean 0 and sigma 1.

The following code doesn't work as some values are over or beyond -1 and 1 (and shouldn't be). Could someont point out what I am doing wrong?

#include <boost/random.hpp>
#include <boost/random/normal_distribution.hpp>

int main()
{
  boost::mt19937 rng; // I don't seed it on purpouse (it's not relevant)

  boost::normal_distribution<> nd(0.0, 1.0);

  boost::variate_generator<boost::mt19937&, 
                           boost::normal_distribution<> > var_nor(rng, nd);

  int i = 0; for (; i < 10; ++i)
  {
    double d = var_nor();
    std::cout << d << std::endl;
  }
}

The result on my machine is:

0.213436
-0.49558
1.57538
-1.0592
1.83927
1.88577
0.604675
-0.365983
-0.578264
-0.634376

As you can see all values are not between -1 and 1.

Thank you all in advance!

EDIT: This is what happens when you have deadlines and avoid studying the theory before doing the practice.

David
  • 2,663
  • 3
  • 24
  • 41
  • 7
    I've forgotton almost all of my statistics, but the variance (which is the second parameter of the distribution's ctor) surely does not specify an absolute cutoff for a range? It is a measure of how spread out things are. –  Jan 16 '10 at 18:55
  • @Neil Butterworth: The second parameter in the constructor is the standard deviation (square root of the variance). – jason Jan 16 '10 at 19:19
  • Well, I did say I'd forgotten almost everything! –  Jan 16 '10 at 19:47
  • Thanks for providing this example! It was helpful for getting started with the `boost::normal_distribution` class. I'm glad the statistics issue was explained below. – solvingPuzzles Jul 22 '12 at 22:00
  • 1
    This is a really helpful example! But may I ask why you used `&` in `boost::variate_generator > var_nor(rng, nd);`? I am learning to use boost as well. Thanks! – Vokram Aug 10 '12 at 10:35
  • @Vokram I guess that the `boost::mt19937&` in the template means use reference to an existing object rather than make a copy of it. – William Aug 23 '15 at 19:05

2 Answers2

29

The following code doesn't work as some values are over or beyond -1 and 1 (and shouldn't be). Could someont point out what I am doing wrong?

No, this is a misunderstanding of the standard deviation (the second parameter in the constructor1) of the normal distribution.

The normal distribution is the familiar bell curve. That curve effectively tells you the distribution of values. Values close to where the bell curve peaks are more likely than values far away (the tail of the distribution).

The standard deviation tells you how spread out the values are. The smaller the number, the more concentrated values are around the mean. The larger the number, the less concentrated values are around the mean. In the image below you see that the red curve has a variance (variance is the square of the standard deviation) of 0.2. Compare this to the green curve which has the same mean but a variance of 1.0. You can see that the values in the green curve are more spread out relative to the red curve. The purple curve has variance 5.0 and the values are even more spread out.

So, this explains why the values are not confined to [-1, 1]. It is, however, an interesting fact that 68% of the values are always within one standard deviation of the mean. So, as an interesting test for yourself write a program to draw a large number of values from a normal distribution with mean 0 and variance 1 and count the number that are within one standard deviation of the mean. You should get a number close to 68% (68.2689492137% to be a little more precise).

alt text

1: From the boost documentation:

normal_distribution(RealType mean = 0, RealType sd = 1);

Constructs a normal distribution with mean mean and standard deviation sd.

Community
  • 1
  • 1
jason
  • 236,483
  • 35
  • 423
  • 525
8

You're not doing anything wrong. For a normal distribution, sigma specifies the standard deviation, not the range. If you generate enough samples, you will see that only about 68% of them lie in the range [mean - sigma, mean + sigma], about 95% within 2 sigma, and more than 99% within 3 sigma.

Jim Lewis
  • 43,505
  • 7
  • 82
  • 96