1

Scipy's quad function can be used to numerically integrate indefinite integrals. However, some functions have a rather narrow range where most of their area is (for example, likelihood functions) and quad sometimes misses it. It returns that the integral is approximately 0 when it really just missed the range of the function that isn't 0.

For example, the area under the curve for a log normal distribution from 0 to inf should be 1. Here it succeeds with a geometric mean of 1 but not 2:

from scipy.integrate import quad
from scipy.stats import lognorm
from scipy import inf

quad(lambda x: lognorm.pdf(x, 0.01, scale=1), 0, inf)
# (1.0000000000000002, 1.6886909404731594e-09)

quad(lambda x: lognorm.pdf(x, 0.01, scale=2), 0, inf)
# (6.920637959567767e-14, 1.2523928482954713e-13)

I often know beforehand approximately where the bulk of the mass is. How do I tell quad to start there? If this isn't possible, I'll accept a different tool.

drhagen
  • 8,331
  • 8
  • 53
  • 82

2 Answers2

1

The points parameter of the quad method can be used to tell it where (approximately) it should look. It can't be used with an improper integral, so the range of integration needs to be split into the finite interval up to the last point, plus an infinite tail.

points = (0.1, 1, 10, 100)
func = lambda x: lognorm.pdf(x, 0.01, scale=2)  # works for other scales too
integral = quad(func, 0, points[-1], points=points)[0] + quad(func, points[-1], np.inf)[0] 

A geometric sequence of points, like in this example, is good enough for a wide range of scales.

  • integration but not really related; any thoughts on [How to get SciPy.integrate.odeint to stop when path is closed?](https://stackoverflow.com/q/33072604/3904031) (yikes that's old!) – uhoh Mar 08 '18 at 04:45
1

If the boundaries are 0, -inf, or inf, then the first guess made by quad is always 1. This can be exploited by shifting the integral so that the waypoint is at 1. For example, shifting the log normal distribution so that its mode is at 1 doesn't change the area, but guarantees that quad finds the bulk of the distribution:

from scipy.integrate import quad
from scipy.stats import lognorm
from scipy import exp, log, inf

mode = exp(log(2) - 0.01**2)

quad(lambda x: lognorm.pdf(x + mode - 1, 0.01, scale=2), -inf, inf)
# (0.9999999999999984, 2.2700129642154882e-09)

This only works if there is only one point of interest and the bounds are -inf to inf (otherwise, shifting the function shifts the bounds, which changes the first guess). If so, then this allows the integral to be computed with a single call to quad.

drhagen
  • 8,331
  • 8
  • 53
  • 82