Updating model on PyMC3 with new observed data

Question

I have measured the diameter of 80 fruits last year, and after checking what is the best distribution of the values, I've created a PyMC3 model

with Model() as diam_model:
    mu = Normal('mu',mu=57,sd=5.42)
    sigma = Uniform('sigma',0,10)

after, as far as I understand, I've "trained" the model with my prior data (the 80 values)

with diam_model:
    dist = Normal('dist',mu=mu,sd=sigma, observed=prior_data.values)

with diam_model:
    samples=fit().sample(1000)

then I used the plot_posteriorof the samples, returning also the mean and HPD.

My idea is to measure again this year using Bayesian update to reduce the sample size. How can I add single values, and update the posterior, expecting that the HPD gets smaller and smaller?

Possible duplicate of [Incremental model update with PyMC3](https://stackoverflow.com/questions/40870840/incremental-model-update-with-pymc3) — merv, Nov 08 '18 at 16:11
Have a look at the notebook they linked to in the answer: https://github.com/pymc-devs/pymc3/blob/master/docs/source/notebooks/updating_priors.ipynb Main thing is you don't use a "best distribution" like you indicate, but instead extract KDE-based distributions for all variables from the trace result, then use those posterior distributions as your new priors in the next round of sampling. — merv, Nov 08 '18 at 17:02
It might also be worth noting that if you switch to an InverseGamma prior on `sd` (or Gamma on `tau`), then your model will be conjugate, and the exact posterior then has a closed form. In that case, you can do online updating with any number of new observations and you don't need to run MCMC. [Wikipedia actually has a nice reference table](https://en.wikipedia.org/wiki/Conjugate_prior#Continuous_distributions). [This CrossValidated question](https://stats.stackexchange.com/q/237037/97653) might also be informative. — merv, Nov 10 '18 at 00:01
@merv I guess your sugestions will take me on new way. Finally. Afterall, I just want to get a easy way to reduce my sample sizes - and time, of course. Time is money. — Hugo, Nov 10 '18 at 11:03
@merv Could this be a simpler and valid approach? https://www.mhnederlof.nl/bayesnormalupdate.html — Hugo, Nov 16 '18 at 10:24
Sure, that is identical to the first normal conjugate model on the wikipedia article. Note that it only estimates the mean and will not generate a posterior on the variance, but maybe that's sufficient for your purposes. — merv, Nov 16 '18 at 16:40
@merv And how could it be done to generate a posterior on variance? — Hugo, Nov 18 '18 at 13:05

merv · Accepted Answer · 2020-09-14T16:30:25.287

Kernel Density Estimated Updated Priors

Using the other answer suggested as a duplicate, one could extract approximate versions of the priors using the code from this Jupyter notebook.

First round

I'll assume we have data from the first round of sampling, which we can impose the mean 57.0 and standard deviation of 5.42.

import numpy as np
import pymc3 as pm
from sklearn.preprocessing import scale
from scipy import stats

# generate data forced to match distribution indicated
Y0 = 57.0 + scale(np.random.normal(size=80))*5.42

with pm.Model() as m0:
    # let's place an informed, but broad prior on the mean
    mu = pm.Normal('mu', mu=50, sd=10)
    sigma = pm.Uniform('sigma', 0, 10)
    
    y = pm.Normal('y', mu=mu, sd=sigma, observed=Y0)
    
    trace0 = pm.sample(5000, tune=5000)

Extracting new priors from posterior

We can then use the results of this model to extract KDE posteriors on the parameters with the following code from the referenced notebook:

def from_posterior(param, samples, k=100):
    smin, smax = np.min(samples), np.max(samples)
    width = smax - smin
    x = np.linspace(smin, smax, k)
    y = stats.gaussian_kde(samples)(x)
    
    # what was never sampled should have a small probability but not 0,
    # so we'll extend the domain and use linear approximation of density on it
    x = np.concatenate([[x[0] - 3 * width], x, [x[-1] + 3 * width]])
    y = np.concatenate([[0], y, [0]])
    return pm.Interpolated(param, x, y)

Second round

Now, if we have more data we can run a new model using the KDE updated priors:

Y1 = np.random.normal(loc=57, scale=5.42, size=100)

with pm.Model() as m1:
    mu = from_posterior('mu', trace0['mu'])
    sigma = from_posterior('sigma', trace0['sigma'])
    
    y = pm.Normal('y', mu=mu, sd=sigma, observed=Y1)
    
    trace1 = pm.sample(5000, tune=5000)

And similarly, one could use this trace to extract updated posterior estimates for future rounds of incoming data.

Caveat Emptor

The above methodology yields approximations to true updated priors and would be most useful in cases where conjugate priors are not possible. It should also be noted that I'm not sure the extent to which such KDE-base approximations introduce errors and how they propagate in a model when used repeatedly. It's a neat trick but one should be cautious about putting this into production without further validation of its robustness.

In particular, I would be very concerned about situations where the posterior distributions have strong correlation structures. The code provided here generates a "prior" distribution using only the marginals of each latent variable. This seems fine for simple models like this, and admittedly the initial priors also lack correlations, so it may not be a huge issue here. Generally, however, summarizing to marginals involves discarding information about how variables are related, and in other contexts this could be rather significant. For example, the default parameterization of a Beta distribution always leads to correlated parameters in the posterior and thus the above technique would inappropriate. Instead, one would need to infer a multi-dimensional KDE that incorporates all the latent variables.

Conjugate Model

However, in your particular case the expected distribution is Gaussian and those distributions have established closed-form conjugate models, i.e., principled solutions rather than approximations. I strongly recommend working through Kevin Murphy's Conjugate Bayesian analysis of the Gaussian distribution.

Normal-Inverse Gamma Model

The Normal-Inverse Gamma model estimates both the mean and the variance of an observed normal random variable. The mean is modeled with a normal prior; the variance with an inverse gamma. This model uses four parameters for the prior:

mu_0  = prior mean
nu    = number of observations used to estimate the mean
alpha = half the number of obs used to estimate variance
beta  = half the sum of squared deviations

Given your initial model, we could use the values

mu_0  = 57.0
nu    = 80
alpha = 40
beta  = alpha*5.42**2

You can then plot the log-likelihood of the prior as follows:

# points to compute likelihood at
mu_grid, sd_grid = np.meshgrid(np.linspace(47, 67, 101), 
                               np.linspace(4, 8, 101))

# normal ~ N(X | mu_0, sigma/sqrt(nu))
logN = stats.norm.logpdf(x=mu_grid, loc=mu_0, scale=sd_grid/np.sqrt(nu))

# inv-gamma ~ IG(sigma^2 | alpha, beta)
logIG = stats.invgamma.logpdf(x=sd_grid**2, a=alpha, scale=beta)

# full log-likelihood
logNIG = logN + logIG

# actually, we'll plot the -log(-log(likelihood)) to get nicer contour
plt.figure(figsize=(8,8))
plt.contourf(mu_grid, sd_grid, -np.log(-logNIG))
plt.xlabel("$\mu$")
plt.ylabel("$\sigma$")
plt.show()

enter image description here

Updating parameters

Given new data, Y1, one updates the parameters as follows:

# precompute some helpful values
n = Y1.shape[0]
mu_y = Y1.mean()

# updated NIG parameters
mu_n = (nu*mu_0 + n*mu_y)/(nu + n)
nu_n = nu + n
alpha_n = alpha + n/2
beta_n = beta + 0.5*np.square(Y1 - mu_y).sum() + 0.5*(n*nu/nu_n)*(mu_y - mu_0)**2

For the sake of illustrating change in the model, let's generate some data from a slightly different distribution and then plot the resulting posterior log-likelihood:

np.random.seed(53211277)
Y1 = np.random.normal(loc=62, scale=7.0, size=20)

which yields