Pystan Posterior Uncertainty Intervals

Question

I saw on another forum that PyStan doesn’t have the same function as RStan where they use posterior_interval(), but we can use numpy.percentile() instead. I’m currently using the pystan.StanModel.optimizing() function in PyStan to get the set of parameters that maximize the posterior likelihood. I now would also like to obtain an outer 95% confidence interval for the posterior result, so I am wondering whether the numpy.percentile() function would be used with the optimizing function?

I tried finding the 95% interval of the parameter distribution, but that did not give a good confidence interval around the result. In particular, I don't consider it good since when I expect the posterior to present a multimodal distribution, the confidence interval I conduct using numpy.percentile() is inside the posterior 2-D gaussian patch.

I think the 95% interval would have to be taken from the posterior. Would I use the percentile function with the optimizing function to obtain a 95% confident posterior result?

@merv here is the link: https://github.com/stan-dev/pystan/issues/397. I wouldn't consider the confidence "good" since when I expect the posterior to present a multimodal distribution, the confidence interval I conduct using np.percentile() is inside the posterior 2-D gaussian patch. — nik jacks, Dec 11 '18 at 14:25

score 3 · Accepted Answer · answered Dec 11 '18 at 23:46

To obtain bounds on posterior estimates requires sampling the posterior, which pystan.StanModel.optimizing does not do. Instead, use the pystan.StanModel.sampling method to generate MCMC draws from the posterior.

If you just need readouts of standard confidence bounds, then the pystan.StanFit.stansummary() method might be sufficient, as this will print the 2.5%, 25%, 50%, 75%, and 97.5% quantiles for each parameter. For example,

fit = sm.sampling(...) # eight schools model
print(fit.stansummary())

Inference for Stan model: anon_model_19a09b474d1901f191444eaf8a6b8ce2.
4 chains, each with iter=10000; warmup=5000; thin=1;  post-warmup
draws per chain=5000, total post-warmup draws=20000.

           mean se_mean     sd   2.5%    25%    50%   75%  97.5%   n_eff   Rhat 
mu         7.98    0.05   5.04   -2.0   4.76   7.91  11.2   18.2   10614    1.0 
tau        6.54    0.08   5.65   0.24   2.49   5.25   8.98  20.65   4552    1.0 
eta[0]     0.39  6.7e-3   0.94  -1.53  -0.23   0.42   1.02   2.18  20000    1.0 
eta[1]   3.3e-4  6.2e-3   0.88  -1.74  -0.58-2.5e-3   0.57   1.75  20000    1.0 
eta[2]     -0.2  6.6e-3   0.93  -2.01  -0.84  -0.22   0.41   1.68  20000    1.0 
eta[3]    -0.03  6.3e-3   0.89   -1.8  -0.61  -0.03   0.56   1.75  20000    1.0 
eta[4]    -0.35  6.7e-3   0.88  -2.04  -0.94  -0.36   0.22   1.44  17344    1.0 
eta[5]    -0.22  6.6e-3    0.9  -1.96  -0.81  -0.24   0.35   1.59  18298    1.0 
eta[6]     0.34  6.8e-3   0.88  -1.43  -0.23   0.36   0.93   2.04  16644    1.0 
eta[7]     0.05  6.6e-3   0.93  -1.77  -0.58   0.04   0.66   1.88  20000    1.0 
theta[0]   11.4    0.07   8.23  -1.83   6.04  10.22  15.42  31.52  13699    1.0 
theta[1]   7.93    0.04   6.21  -4.58   4.09   7.89  11.79  20.48  20000    1.0 
theta[2]   6.17    0.06   7.72 -11.43   2.06   6.65  10.85  20.53  16367    1.0 
theta[3]   7.72    0.05   6.53  -5.29   3.68    7.7  11.75  21.04  20000    1.0 
theta[4]   5.14    0.04   6.35  -9.35   1.44   5.64   9.38  16.49  20000    1.0 
theta[5]   6.11    0.05   6.66  -8.44   2.22   6.44  10.41  18.52  20000    1.0 
theta[6]  10.63    0.05   6.76  -1.25   6.04  10.08  14.51  25.66  20000    1.0 
theta[7]    8.4    0.06   7.85  -7.56   3.89   8.17  12.76   25.3  16584    1.0 
lp__      -4.89    0.04   2.63 -10.79  -6.47  -4.66  -3.01  -0.43   5355    1.0

Or if you need specific quantiles, you would use numpy.percentile as you mentioned.

However, as you correctly observe, this is inappropriate for multimodal distributions. This case is addressed in a different answer, but note that if one expects multiple modes a priori, a mixture model is typically used to separate the modes out into distinct unimodal random variables.

score 3 · Answer 2 · answered Jan 21 '19 at 13:20

You can retrieve the percentiles you wish straight from pystan.stansummary:

percentiles = (0.025, 0.25, 0.5, 0.75, 0.975)              # edit these at will
pystan.stansummary(fit=your_fit, probs=percentiles, digits_summary=2)

This should work fine.

Pystan Posterior Uncertainty Intervals

2 Answers2