The Problem
A lot of my programming involves the statistics functions in scipy.stats. A new problem required computing the pmf of the beta-binomial distribution. Because it has an analytic form, but doesn't show up in scipy.stats, I needed to define a function for its pmf myself. I am using scipy version 0.12.0 and numpy version 1.7.0.
import numpy
from scipy.special import gammaln, betaln
def beta_binomial_pmf(k, n, K, N):
# compute natural log of pmf
ln_pmf = ( gammaln(n+1) - gammaln(k+1) - gammaln(n-k+1) ) + \
- betaln(K+1,N-K+1) + betaln(K+k+1,N-K+n-k+1)
return numpy.exp(ln_pmf)
In the statistics problem I am trying to solve the values of n and k normally range between 0 and 100, but K and N can be as large as 1e9. My issue is that this function will return the same value for different inputs.
Examples
k = 0
n = 5
K = numpy.array([12, 10, 8])
N = 101677958
beta_binomial(k, n, L, N)
The resulting array is
array([ 0.99999928, 0.99999905, 0.99999928])
which is rather odd given that each of values of K is different. To get a better sense of the similarity between the first and third values in the array
1 - beta_binomial(k, n, L, N)
array([ 7.15255482e-07, 9.53673862e-07, 7.15255482e-07])
A really simple test of the precision of the gammaln
function is 1-(Gamma(N+1)/Gamma(N))/N. It is useful because the result is exactly 0 if you work out the algebra on paper.
N = numpy.logspace(0,10,11)
1-numpy.exp(gammaln(N+1)-gammaln(N))/N
array([ 0.00000000e+00, -1.11022302e-15, 1.90958360e-14,
-9.94537785e-13, -4.96402919e-12, 7.74684761e-11,
-1.70086167e-13, 1.45905219e-08, 2.21033640e-07,
-7.64616381e-07, 2.54126535e-06])
Question
I recognize that there is a limit to the precision one can compute, but what happens at around N=1e7 that makes the precision change in gammaln
by five orders of magnitude? Suggestions on how to get around this problem?