why does quad return both zeros when integrating a simple Gaussian pdf at a very small variance?

Question

when the sigma2 is large, the integration gives correct answer. but when reducing the sigma2 to 1e-8 or below, it returns 0. Any idea how to solve the problem?

I checked Matlab which gives correct answer no matter how small the sigma2 is. Thanks!

quad(lambda x: 1/sqrt(2*pi*sigma2)*exp(-x**2/(2*sigma2)), -10, 10)

Well....what happens to that exponential bit when `sigma2` gets really really small (but still positive)? — , Oct 19 '16 at 16:54
@JackManey Its contribution is balanced by `sigma2` in the denominator. The function being integrated is the Gaussian pdf, its integral is 1 regardless of how small sigma2 is. But the numerical method used by `quad` has an issue with the thin spike near zero. — , Dec 13 '16 at 16:12
Possible duplicate of [scipy.integrate.quad gives wrong result on large ranges](https://stackoverflow.com/questions/30913664/scipy-integrate-quad-gives-wrong-result-on-large-ranges) — , Oct 28 '18 at 01:51

score 1 · Answer 1 · answered Dec 13 '16 at 20:52

Short answer

Use the optional parameter points to help the integrator locate the peak of the Gaussian:

quad(lambda x: 1/sqrt(2*pi*sigma2)*exp(-x**2/(2*sigma2)), -10, 10, points=[-10*sqrt(sigma2), 10*sqrt(sigma2)])

The output is accurate even if sigma2 = 1e-100.

The meaning of points is to indicate change of behavior of the function: from being 0 (for all practical purposes) it turns and forms a very sharp peak within a few standard deviations from the mean.

A simpler way is to simply replace the interval of integration with -10*sqrt(sigma2), 10*sqrt(sigma2), but this may not be what you want if the integral involves other terms besides the Gaussian.

Explanation

The issue is that when the Gaussian peak is very thin (e.g., width 0.001 within an interval of length 20), and integration routine is likely to never see it. The integrators are generally good at adapting to the features of the functions they are given, but they cannot adapt to the features they do not see.

The good performance of MATLAB's quad on this example is mostly luck: it uses adaptive Simpson's method, which involves sampling the function exactly at the midpoint of the interval of integration. If you give it a non-symmetric interval such as (-10,11), it performs much worse than Scipy's quad. Here's a comparison (using my copy of MATLAB, which is R2013a):

quad(...,-10,10) always returns 1
quad(...,-10,11) returns 1 for sigma2=0.1, and is wrong for sigma2=0.01 and smaller.
scipy.integrate.quad(..., -10, 10) returns 1 up to 1e-4
scipy.integrate.quad(..., -10, 11) returns 1 up to 1e-3
MATLAB's integral(...) returns 1 up to 1e-5. Whether the interval is (-10,10) or (-10,11) makes no difference for it.

It's worth mentioning that MATLAB's quad is deprecated, and MathWorks recommend using integral, which as you can see does a much better job when luck is not involved.

The difference in performance between MATLAB's integral and SciPy's quad is not dramatic; for strongly localized Gaussians (relative to the interval of integration), both will need help locating the peak. The help comes in the form of points parameter of scipy.integrate.quad or WayPoints parameter of integral.

why does quad return both zeros when integrating a simple Gaussian pdf at a very small variance?

1 Answers1

Short answer

Explanation

Linked