FloatingPointError from PyMC in sampling from a Dirichlet distribution

Question

After being unsuccessful in using decorators to define the stochastic object of the "logarithm of an exponential random variable", I decided to manually write the code for this new distribution using pymc.stochastic_from_dist. The model that I am trying to implement is available here(the first model): enter image description here

Now when I try to sample the log(alpha) using MCMC Metropolis and with a Normal distribution as proposal(as it has been stated in the following picture as the sampling method), I am getting the following error:

  File "/Library/Python/2.7/site-packages/pymc/distributions.py", line 980, in rdirichlet
    return (gammas[0]/gammas[0].sum())[:-1]

FloatingPointError: invalid value encountered in divide

Although the times that the sampling doesn't run into error the sampling histograms are matching with the ones in this paper. My hierarchical model is:

"""
A Hierarchical Bayesian Model for Bags of Marbles

logalpha ~ logarithm of an exponential distribution with parameter lambd
beta ~ Dirichlet([black and white ball proportions]:vector of 1's)
theta ~ Dirichlet(alpha*beta(vector))

"""

import numpy as np
import pymc
from scipy.stats import expon
lambd=1.
__all__=['alpha','beta','theta','logalpha']
#------------------------------------------------------------
# Set up pyMC model: logExponential
# 1 parameter: (alpha)

def logExp_like(x,explambda):
    """log-likelihood for logExponential"""
    return -lambd*np.exp(x)+x
def rlogexp(explambda, size=None):
    """random variable from logExponential"""
    sample=np.random.exponential(explambda,size)
    logSample=np.log(sample)
    return logSample
logExponential=pymc.stochastic_from_dist('logExponential',logp=logExp_like,
                                          random=rlogexp,
                                          dtype=np.float,
                                          mv=False)
#------------------------------------------------------------
#Defining model parameteres alpha and beta.
beta=pymc.Dirichlet('beta',theta=[1,1])
logalpha=logExponential('logalpha',lambd)

@pymc.deterministic(plot=False)
def multipar(a=logalpha,b=beta):
    out=np.empty(2)
    out[0]=(np.exp(a)*b)
    out[1]=(np.exp(a)*(1-b))
    return out
theta=pymc.Dirichlet('theta',theta=multipar)

And my test sampling code is:

from pymc import Metropolis
from pymc import MCMC
from matplotlib import pyplot as plt
import HBM
import numpy as np
import pymc
import scipy
M=MCMC(HBM)
M.use_step_method(Metropolis,HBM.logalpha, proposal_sd=1.,proposal_distribution='Normal')
M.sample(iter=1000,burn=200)

When I check the values of theta passed to gamma distribution in line 978 of distributions.py I see that there are not zero but small values! So I don't know how to prevent this floating point error?

I think there should be an exception catcher in the main code of distributions.py for PyMC to handle the cases where gammas[0] is zero! — Cupitor, Oct 17 '13 at 18:51
what will be result if you'll add `np.seterr(divide='ignore')` just after imports in your test sampling code? — alko, Oct 24 '13 at 13:06

Xinthral · Answer 1 · 2020-07-02T05:03:01.817

I found this nugget in their documentation:

The stochastic variable cutoff cannot be smaller than the largest element of D, otherwise D’s density would be zero. The standard Metropolis step method can handle this case without problems; it will propose illegal values occasionally, but these will be rejected.

This would lead me to believe that the dtype=np.float (which is essential has the same range as float), may not be the method you want to go about. The documentation says it needs to be a numpy dtype, but it just needs to be a dtype that converts to a numpy dtype object and in Python2 (correct me if I'm wrong) number dtypes were fixed size types meaning they're the same. Maybe utilizing the Decimal module would be an option. This way you can set the level of precision to encapsulate expected value ranges, and pass it to your extended stochastic method where it would be converted.

from decimal import Decimal, getcontext
getcontext().prec = 15
dtype=Decimal

I don't know this wouldn't still be truncated once the numpy library got a hold of it, or if it would respect the inherited level of precision. I have no accurate method of testing this, but give it a try and let me know how that works for you.

Edit: I tested the notion of precision inheritance and it would seem to hold:

>>> from decimal import Decimal, getcontext
>>> getcontext().prec = 10
>>> Decimal(1) / Decimal(7)
Decimal('0.1428571429')
>>> np.float(Decimal(1) / Decimal(7))
0.1428571429
>>> getcontext().prec = 15
>>> np.float(Decimal(1) / Decimal(7))
0.142857142857143
>>>

Hey Jesse, thanks for the answer, but I hardly remember anything about PyMC these days. I wish they had a better maintenance. I have moved on, haha! — Cupitor, Jul 03 '20 at 18:06

score 0 · Answer 2 · answered Oct 28 '13 at 17:20

0

If you do get small numbers, it might simply be too small for a float. This is typically also what the logarithms are there for to avoid. What if you use dtype=np.float64?

answered Oct 28 '13 at 17:20

jaap

5,661
2
20
25

The point is that would need playing with the source code of PyMC! Which I prefer not! But thanks. – Cupitor Oct 28 '13 at 21:22

score 0 · Answer 3 · answered Mar 30 '20 at 22:50

As you have suggested at the end of your question, the issue is with too small numbers that are float-casted to 0. One solution could be to tweak a little the source code and replace the division with for example np.divide and in the "where" condition to add some explicit casting for to small values to a given threshold.

FloatingPointError from PyMC in sampling from a Dirichlet distribution

3 Answers3