3

After looking at several questions/answers (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11) and PyMC3's documentation, I've managed to create a MCVE of my MCMC setup (see below).

My fitted parameters are continuous and discrete, so the priors are defined using pm.Uniform and pm.DiscreteUniform (with a re-scaling applied to the latter). My likelihood function is particularly convoluted (it involves comparing the N-dimensional histograms of my observed data and some synthetic data generated using the free parameters), so I had to write it using theano's @as_op operator.

The implementation shown here works on a toy model working on random data, but in my actual model the likelihood and parameters are very similar.

My questions are:

  1. Is this correct? Is there anything I should be doing different?
  2. The call to the likelihood function is just thrown there apparently doing nothing and connected to nothing. Is this the proper way to do this?
  3. I'm using NUTS for the continuous parameters but since my likelihood is numeric, I don't think I should be able to do this. Since the code still runs, I'm nut sure what's going on.

This is the first time I've used PyMC3 so any pointers will be really helpful.

import matplotlib.pyplot as plt
import numpy as np
import pymc3 as pm
import theano.tensor as tt
from theano.compile.ops import as_op


def main():
    trace = bayesMCMC()
    print(pm.summary(trace))
    pm.traceplot(trace)
    plt.show()


def bayesMCMC():
    """
    Define and process the full model.
    """
    with pm.Model() as model:
        # Define uniform priors.
        A = pm.Uniform("A", lower=0., upper=5.)
        B = pm.Uniform("B", lower=10., upper=20.)
        C = pm.Uniform("C", lower=0., upper=1.)

        # Define discrete priors.
        minD, maxD, stepD = 0.005, 0.06, 0.005
        ND = int((maxD - minD) / stepD)
        D = pm.DiscreteUniform("D", 0., ND)
        minE, maxE, stepE = 9., 10., 0.05
        NE = int((maxE - minE) / stepE)
        E = pm.DiscreteUniform("E", 0., NE)

        # Is this correct??
        logp(A, B, C, D, E)

        step1 = pm.NUTS(vars=[A, B, C])
        print("NUTS")
        step2 = pm.Metropolis(vars=[D, E])
        print("Metropolis")

        trace = pm.sample(300, [step1, step2])  # , start)

        return trace


@as_op(
    itypes=[tt.dscalar, tt.dscalar, tt.dscalar, tt.lscalar, tt.lscalar],
    otypes=[tt.dscalar])
def logp(A, B, C, D, E):
    """
    Likelihood evaluation.
    """
    # Get observed data and some extra info to re-scale the discrete parameters
    obsData, minD, stepD, minE, stepE = obsservedData()

    # Scale discrete parameters
    D, E = D * stepD + minD, E * stepE + minE

    # Generate synthetic data using the prior values
    synthData = synthetic(A, B, C, D, E)

    # Generate N-dimensional histograms for both data sets.
    obsHist, edges = np.histogramdd(obsData)
    synHist, _ = np.histogramdd(synthData, bins=edges)

    # Flatten both histograms
    obsHist_f, synHist_f = obsHist.ravel(), synHist.ravel()
    # Remove all bins where N_bin=0.
    binNzero = obsHist_f != 0
    obsHist_f, synHist_f = obsHist_f[binNzero], synHist_f[binNzero]
    # Assign small value to the 0 elements in synHist_f to avoid issues with
    # the log()
    synHist_f[synHist_f == 0] = 0.001

    # Compare the histograms of the observed and synthetic data via a Poisson
    # likelihood ratio.
    lkl = -2. * np.sum(synHist_f - obsHist_f * np.log(synHist_f))

    return lkl


def obsservedData():
    """Some 'observed' data."""
    np.random.seed(12345)
    N = 1000
    obsData = np.random.uniform(0., 10., (N, 3))

    minD, stepD = 0.005, 0.005
    minE, stepE = 9., 0.05

    return obsData, minD, stepD, minE, stepE


def synthetic(A, B, C, D, E):
    """
    Dummy function to generate synthetic data. The actual function makes use
    of the A, B, C, D, E variables (obviously).
    """
    M = np.random.randint(100, 1000)
    synthData = np.random.uniform(0., 10., (M, 3))
    return synthData


if __name__ == "__main__":
    main()
Gabriel
  • 40,504
  • 73
  • 230
  • 404

0 Answers0