pymc3 hierarchical model with multiple observations, not calculating likelihood during MCMC?

Question

I am trying to fit a hierarchical model using pymc3 and running into trouble where it looks like my priors are being sampled but my likelihood is not. I have tried replicating the approach described here that works for multiple observations.

I am using the mallard dataset found in the unmarked package in R, loaded as a pandas dataframe. You can grab the CSV I'm using from my GitHub page here. In brief, each row represents up to three observations (y1, y2, and y3 columns) collected along a single transect. I have tried removing rows with missing data (NaN) in any of the y columns or covariates (elevation, forest cover, survey length), which did not appear to change the issue I'm experiencing.

mallard = pd.read_csv('mallard.csv')
elev, length, forest = mallard.elev, mallard.length, mallard.forest

# transpose ys to have same shape as series of each covariate (elev, length...)
observations = mallard[['y1','y2','y3']].T
num_transects = len(mallard)

with pm.Model() as mallard_model:
    # priors
    # detection probability considered constant across all transects and plots
    prob = pm.Beta('prob', alpha=1, beta=1)
    # priors for deterministic linear model of local abundance at a transect 
    # that applies to all plots within a transect
    b0 = pm.Normal('b0', mu=0, sd=10) # intercept
    b1 = pm.Normal('b1', mu=0, sd=10) # elevation
    b2 = pm.Normal('b2', mu=0, sd=10) # forest cover
    b3 = pm.Normal('b3', mu=0, sd=10) # survey length

    # linear model of local abundance using log link
    lam = pm.Deterministic('lam', pm.math.exp(b0 + b1*elev + b2*forest + b3*length))

    # likelihood of observations
    # Ni is abundance at a transect, with binomial distribution across 
    # plots within a transect

    Ni = pm.Poisson('Ni', mu=lam, shape=num_transects)
    Y_obs = pm.Binomial('Y_obs', n=Ni, p=prob, observed=observations)

    # inference, use default step functions for each parameter
    trace = pm.sample(draws=5000, init='ADVI', n_init=10000)#, step=pm.Metropolis())

    plt.figure(figsize=(7, 7))
    pm.traceplot(trace[100:]) # leave out first 100 draws as burn-in
    plt.tight_layout()

This is what comes out:
Assigned NUTS to prob_logodds_
Assigned NUTS to b0
Assigned NUTS to b1
Assigned NUTS to b2
Assigned NUTS to b3
Assigned Metropolis to Ni
Assigned Metropolis to Y_obs_missing
100%|████████████████████████████████| 5000/5000 [00:07<00:00, 675.88it/s]

I noticed there is no initialization output being reported using ADVI, which seems disconcerting.

I also found that if I force the model to use a particular step function (e.g., the pm.Metropolis() I've commented out in the call to pm.sample above), I do get sampling on some--but not all--parameters. Still no initialization reported either.

Any ideas how to get this working?

score 0 · Answer 1 · answered Mar 06 '17 at 10:08

0

I had a similar problem and it was just that my observation datas was not in the format number of observations x number of variables. As a result only the first observation was used.

Try observations = mallard[['y1','y2','y3']]

If you want to sample your posterior use sample_ppc

answered Mar 06 '17 at 10:08

Quantic

33
6

When I don't transpose the y-values, I get this error: `ValueError: Input dimension mis-match. (input[0].shape[1] = 239, input[1].shape[1] = 3)`. Running into this error is what led me to transposing in the first place. – David Diaz Mar 10 '17 at 18:25
aseyboldt is right you should care about Ni>observed and your probablity should be in [0,1] I guess. Also maybe you have outlier in your observed that make the convergence difficult. – Quantic Mar 15 '17 at 10:04
Probability is defined as a Beta-distributed random variable, which should keep it [0,1]. Haven't gotten back to re-running this yet, but am hopeful the starting values are the problem (per @aseyboldt), as these seem like solid leads. – David Diaz Mar 15 '17 at 14:05

score 0 · Answer 2 · answered Mar 13 '17 at 12:17

0

I think you have several problems in your model:

Discrete parameters do not work with advi and nuts, and I doubt the metropolis sampler can deal with all those discrete parameters either. In many cases you can marginalize them out, but here you might want to use a continuous variable for the population size instead. Maybe something like this (which also takes care of the fact that population sizes lower than the observed number are impossible)
```
sd = pm.HalfCauchy('Ni_sd', beta=2.5)
trafo = pm.distributions.transforms.lowerbound(observations.max(axis=0))
Ni = pm.Gamma('Ni', mu=lam, sd=sd, shape=num_transects,
              transform=trafo, testval=observations.max(axis=0) + 1)
```
I think your model is unidentifiable: Couldn't you always just increase all population sizes and decrease prop? I can't see how this model could learn anything useful about prop. Where would this information come from?
Why would the population size depend on the survey length? Shouldn't this influence prop? Or is it an area?

answered Mar 13 '17 at 12:17

aseyboldt

1,090
8
14

In this field survey, "length" was how much time was spent at the plot counting birds, and longer survey length would presumably lead to larger counts, all else being equal. – David Diaz Mar 15 '17 at 14:12
The exponentiation of the linear model to calculate lambda should keep it positive and constrain Ni~Poisson(lambda) positive as well. I do think I should probably provide starting values for Ni that are larger than lambda, though. – David Diaz Mar 15 '17 at 14:18
Maybe I just don't understand the model, but I thought Ni are the true population sizes, of which you only observe a proportion prop. Then looking for a longer time should increase prop, not Ni. After all, the number of birds does not increase just because you look harder. But anyway, I don't see where information about prob should come from. If you had multiple observations for some animals, then I guess this could work (if you start seeing the same animals over and over, then you seem to have exhausted the population). – aseyboldt Mar 15 '17 at 16:47
Ups, I guess if length is the length of the trail then really the number of animals should be larger for longer trails. Sorry about that. – aseyboldt Mar 15 '17 at 16:48
Prob is the probability of successfully detecting each bird over Ni "tries", which is how it's being used in the Binomial distribution for Y_obs. In a more elaborate form of this kind of model, prob could vary based on local environmental covariates (e.g., forest cover %) or time of year. In the version I've posted, it would be constant for this species, and could be compared to estimates of prob for other species, for example. – David Diaz Mar 15 '17 at 17:21
I misspoke earlier about starting values for Ni... What I should've said is that I would need to make sure that Ni starts above maximum Y_obs. – David Diaz Mar 15 '17 at 17:46

pymc3 hierarchical model with multiple observations, not calculating likelihood during MCMC?

2 Answers2