2

Suppose that I generate some sample data using pymc3 for a gamma distribution:

import pymc3 as pm
import arviz as az

# generate fake data:
with pm.Model() as model2:
    g = pm.Gamma('g', alpha=1.7, beta=0.097)
    
syn = g.random(size=1000)
plt.hist(syn, bins=50);

enter image description here

Now, I will create a model to fit a gamma distribution on that data:

model = pm.Model()

with model: 

    # alpha
    alpha = pm.Exponential('alpha', lam=2)

    # beta
    beta = pm.Exponential('beta', lam=0.1)

    g = pm.Gamma('g', alpha=alpha, beta=beta, observed=syn)

    trace = pm.sample(2000, return_inferencedata=True)

This will correctly get the values and distribution that created the original fake data. Now, I want to plot the pdf (but I don't know how to do that!). I saw an example that did this:

with model:
    post_pred = pm.sample_posterior_predictive(trace.posterior)
# add posterior predictive to the InferenceData
az.concat(trace, az.from_pymc3(posterior_predictive=post_pred), inplace=True)

which creates a matrix that contains samples from the estimated pdfs. I plot the results with:

fig, ax = plt.subplots()
az.plot_ppc(trace, ax=ax)
ax.hist(syn, bins=100, alpha=.3, density=True, label='data')
ax.legend(fontsize=10);
plt.xlim([0,60])

which gives:

enter image description here

which is not what I'm looking for. Instead, I'd like to sample from the posterior of alpha and beta to draw many gamma pdfs. I can do that by sampling and plotting lines, but I thought this has to be something that is already implemented with pymc3 or arviz, but I just don't know it. Thanks in advance if you could tell me how to plot what I want.

OriolAbril
  • 7,315
  • 4
  • 29
  • 40
Vladimir Vargas
  • 1,744
  • 4
  • 24
  • 48

2 Answers2

3

For this particular task, I'd recommend combining xarray (ArviZ's InferenceData is based on xarray Datasets) and scipy to generate the pdfs.

If using the right dimensions so that everything broadcasts, scipy.stats.gamma.pdf can be used to generate the pdfs for specific values of alpha and beta. Given that the posterior is stored as an xarray Dataset, we can use xarray.apply_ufunc to handle the broadcasting so we can use scipy to generate the pdfs to plot.

The first step is to store the xrange as an xarray object , otherwise xarray won't know how to correctly broadcast. The second is to generate the pdfs using apply_ufunc. Note that here I am generating pdfs for every single draw, below there is also a way to select a random subset.

import scipy.stats as stats
import xarray as xr

xrange = xr.DataArray(np.linspace(0, 90, 100), dims="x")
xr.apply_ufunc(
    lambda alpha, beta, x: stats.gamma(a=alpha, scale=1/beta).pdf(x),
    trace.posterior["alpha"], 
    trace.posterior["beta"], 
    xrange
)

To quickly plot only the pdfs corresponding to a subset of the draws there are several alternatives, here is one possibility using the idea above.

# get random subset of the posterior
rng = np.random.default_rng()
idx = rng.choice(trace.posterior.alpha.size, 200)
post = trace.posterior.stack(sample=("chain", "draw")).isel(sample=idx)
pdfs = xr.apply_ufunc(
    lambda alpha, beta, x: stats.gamma(a=alpha, scale=1/beta).pdf(x),
    post["alpha"], post["beta"], xrange,
)
# plot results, for proper plotting, "x" dim must be the first
plt.plot(xrange, pdfs.transpose("x", ...));

enter image description here

OriolAbril
  • 7,315
  • 4
  • 29
  • 40
0

A solution which is extremely slow an inefficient is:

alphas = np.random.choice(trace.posterior["alpha"].data.flatten(), size=500)
betas = np.random.choice(trace.posterior["beta"].data.flatten(), size=500)
xrange = np.linspace(0, 90, 1000)
pdfs = []
for alpha, beta in zip(alphas, betas):
    with pm.Model() as gammamodel:
        gam = pm.Gamma("gam", alpha=alpha, beta=beta)
    pdf = gam.distribution.logp(xrange).eval()
    pdfs.append(np.exp(pdf))

fig, ax = plt.subplots()
ax.hist(
    data, bins=np.arange(0, len(np.unique(data))), alpha=0.3, density=True, label="data"
)
for pdf in pdfs:
    ax.plot(xrange, pdf, "grey", alpha=0.2)
Vladimir Vargas
  • 1,744
  • 4
  • 24
  • 48
  • I don't think it is correct to sample the `alphas` and `betas` from their marginals like this - it discards any correlation structure between the variables and for this parameterization of the Beta there is nearly always correlation. Instead, I think sampling the samples (say by index) and with replacement is the correct way to sample the posterior of plausible Beta distributions. – merv Aug 28 '20 at 00:09
  • 1
    Oops...sorry I misread the OP and thought you had a Beta, but see now it is Gamma. I would still check to see if the `alphas` and `betas` are correlated (not sure if this is as true in practice for Gamma as it is for Beta), and still hold that in general sampling the samples together is the correct procedure. – merv Aug 28 '20 at 00:45
  • Ok, thanks. I'll fix this. Also, I found better not to create a pymc3 model to obtain the gamma distribution. I wrote the gamma distribution with numpy and it runs a lot faster than this implementation. – Vladimir Vargas Aug 28 '20 at 13:05
  • so instead of having the kde corresponding to the posterior predictive samples you want to plot the gamma pdf of the corresponding alpha and beta for some of the draws right? – OriolAbril Aug 28 '20 at 14:04
  • Exactly @OriolAbril ! – Vladimir Vargas Aug 28 '20 at 14:13
  • 1
    I added an answer on how to do this with xarray, I hope it works and is simple enough. It's nice to see `return_inferencedata=True` being used :) – OriolAbril Aug 28 '20 at 19:05