Highest Density Interval (HDI) for Posterior Distribution Pystan

Question

I am seeing that in Pystan, an HDI function can be used to provide a 95% credible interval surrounding the posterior distribution. However, they say it will only work for unimodal distributions. If my model may have a multimodal distribution (up to 4 peaks), is there a way I can find the HDI in Pystan? Thanks!

merv · Accepted Answer · 2018-12-11T18:33:51.553

I wouldn't consider this a Stan/PyStan-specific issue. The Highest Density Interval is by definition a single interval and therefore inappropriate for characterizing multimodal distributions. There is a nice work by Rob Hyndman, Computing and Graphing Highest Density Regions, that extends the concept to multimodal distributions, and this has been implemented in R under the hdrcde package.

As for Python, there's a discussion of this on the PyMC Discourse site, where it is recommended to use a function (hpd_grid) that Osvaldo Martin wrote for his "Bayesian Analysis with Python" book. The source for that function is in the hpd.py file, and for a 95% region would be used like

from hpd import hpd_grid

intervals, x, y, modes = hpd_grid(samples, alpha=0.05)

where samples are the posterior samples of one of your parameters, and intervals are a list of tuples representing the regions of highest density.

Example with Plot

Here's an example plot using some fake multimodal data.

import numpy as np
from matplotlib import pyplot as plt
from hpd import hpd_grid

# include two modes
samples = np.random.normal(loc=[-4,4], size=(1000, 2)).flatten()

# compute high density regions
hpd_mu, x_mu, y_mu, modes_mu = hpd_grid(samples)

plt.figure(figsize=(8,6))

# raw data
plt.hist(samples), density=True, bins=29, alpha=0.5)

# estimated distribution
plt.plot(x_mu, y_mu)

# high density intervals
for (x0, x1) in hpd_mu:
    plt.hlines(y=0, xmin=x0, xmax=x1, linewidth=5)
    plt.axvline(x=x0, color='grey', linestyle='--', linewidth=1)
    plt.axvline(x=x1, color='grey', linestyle='--', linewidth=1)

# modes
for xm in modes_mu:
    plt.axvline(x=xm, color='r')

plt.show()

enter image description here

Cautionary Note

It should be noted that multimodal posterior distributions on properly modeled parameters are typically rare, but do come up extremely frequently in non-converged MCMC sampling, especially when multiple chains are used (which is the best practice). If one expects multimodality a priori, then usually that leads to some form of mixture model which would eliminate the multimodality. If one doesn't expect multimodality, but the posteriors exhibit it anyway, then that's a red flag to be skeptical of the results.

Hi. Will this work if I am using the fit.optimizing() function in Pystan to obtain my posterior? — nik jacks, Dec 11 '18 at 14:37
@nikjacks `pystan.StanModel.optimizing()` only generates a MAP point estimate of parameters, so no. Use `pystan.StanModel.sampling()` to obtain samples from your posterior. — merv, Dec 11 '18 at 16:50
thanks! So would it make sense to use the optimizing function to generate the parameters that maximize the posterior for my result and then use the hpd_grid() function with StanModel.sampling() to generate the parameters that will give a confidence interval around the result? — nik jacks, Dec 11 '18 at 17:15
@nikjacks I wouldn't use `optimizing` at all and in fact only ever use it in diagnostic stages of model development. It only returns a single value per parameter, and since you expect multiple modes, could change depending on your random initialization. Moreover, MAP estimates (while better than MLEs) often lie outside the typical set of high-dimensional distributions and therefore are not representative. The `modes` list returned from the `hpd_grid` is likely what you're after and the `intervals` are the bounds around them. — merv, Dec 11 '18 at 18:05
Is there a code or a gist to do the same for `scipy` distributions? Namely for the class of `rv_continuous` in `scipy`. — Mark, Jan 04 '23 at 19:14
@Mark this code is agnostic - it will work for any numpy array of numbers. — merv, Jan 05 '23 at 02:15

Highest Density Interval (HDI) for Posterior Distribution Pystan

1 Answers1

Example with Plot

Cautionary Note

Linked