I would like to use my PyMC3 LR model to get an 80% HPD range for the value of the predicted variable y
as new data becomes available.
Thus, extrapolate a credible distribution of values for y
for a new value of x
not in my original dataset.
Model:
with pm.Model() as model_tlr:
alpha = pm.Normal('alpha', mu=0, sd=10)
beta = pm.Normal('beta', mu=0, sd=10)
epsilon = pm.Uniform('epsilon', 0, 25)
nu = pm.Deterministic('nu', pm.Exponential('nu_', 1/29) + 1)
mu = pm.Deterministic('mu', alpha + beta * x)
yl = pm.StudentT('yl', mu=mu, sd=epsilon, nu=nu, observed=y)
trace_tlr = pm.sample(50000, njobs=3)
After burnin I sample from the posterior and get an HPD
ppc_tlr = pm.sample_ppc(btrace_tlr, samples=10000, model=model_tlr)
ys = ppc_tlr['yl']
y_hpd = pm.stats.hpd(ys, alpha=0.2)
Which is great for visualizing the HPD around the central tendency (using fill_between)
But I would like to now use the model to get an HPD for y
when x=126.2
(for example) and the initial dataset didn't contain an observed x=126.2
The way I understand sampling from the posterior is that there are 10k samples for each of the available x
values in the dataset and hence there isn't a corresponding sampling in ys
for x=126.2
as it wasn't observed.
Basically, is there a way to use my model to obtain a distribution of credible values (based on the model) from a predictor value x=126.2
which only became available after the model was built?
If so, how?
Thank you
EDIT:
Found SO Post which mentions
Function under development (will likely eventually get added to pymc3) that will allow to predict posteriors for new data.
Does this exist?