2

I am trying to fit a hierarchical model using pymc3 and running into trouble where it looks like my priors are being sampled but my likelihood is not. I have tried replicating the approach described here that works for multiple observations.

I am using the mallard dataset found in the unmarked package in R, loaded as a pandas dataframe. You can grab the CSV I'm using from my GitHub page here. In brief, each row represents up to three observations (y1, y2, and y3 columns) collected along a single transect. I have tried removing rows with missing data (NaN) in any of the y columns or covariates (elevation, forest cover, survey length), which did not appear to change the issue I'm experiencing.

mallard = pd.read_csv('mallard.csv')
elev, length, forest = mallard.elev, mallard.length, mallard.forest

# transpose ys to have same shape as series of each covariate (elev, length...)
observations = mallard[['y1','y2','y3']].T
num_transects = len(mallard)

with pm.Model() as mallard_model:
    # priors
    # detection probability considered constant across all transects and plots
    prob = pm.Beta('prob', alpha=1, beta=1)
    # priors for deterministic linear model of local abundance at a transect 
    # that applies to all plots within a transect
    b0 = pm.Normal('b0', mu=0, sd=10) # intercept
    b1 = pm.Normal('b1', mu=0, sd=10) # elevation
    b2 = pm.Normal('b2', mu=0, sd=10) # forest cover
    b3 = pm.Normal('b3', mu=0, sd=10) # survey length

    # linear model of local abundance using log link
    lam = pm.Deterministic('lam', pm.math.exp(b0 + b1*elev + b2*forest + b3*length))

    # likelihood of observations
    # Ni is abundance at a transect, with binomial distribution across 
    # plots within a transect

    Ni = pm.Poisson('Ni', mu=lam, shape=num_transects)
    Y_obs = pm.Binomial('Y_obs', n=Ni, p=prob, observed=observations)

    # inference, use default step functions for each parameter
    trace = pm.sample(draws=5000, init='ADVI', n_init=10000)#, step=pm.Metropolis())

    plt.figure(figsize=(7, 7))
    pm.traceplot(trace[100:]) # leave out first 100 draws as burn-in
    plt.tight_layout()

This is what comes out:
Assigned NUTS to prob_logodds_
Assigned NUTS to b0
Assigned NUTS to b1
Assigned NUTS to b2
Assigned NUTS to b3
Assigned Metropolis to Ni
Assigned Metropolis to Y_obs_missing
100%|████████████████████████████████| 5000/5000 [00:07<00:00, 675.88it/s]

Traceplot

I noticed there is no initialization output being reported using ADVI, which seems disconcerting.

I also found that if I force the model to use a particular step function (e.g., the pm.Metropolis() I've commented out in the call to pm.sample above), I do get sampling on some--but not all--parameters. Still no initialization reported either.

Any ideas how to get this working?

David Diaz
  • 437
  • 4
  • 11

2 Answers2

0

I had a similar problem and it was just that my observation datas was not in the format number of observations x number of variables. As a result only the first observation was used.

Try observations = mallard[['y1','y2','y3']]

If you want to sample your posterior use sample_ppc

Quantic
  • 33
  • 6
  • When I don't transpose the y-values, I get this error: `ValueError: Input dimension mis-match. (input[0].shape[1] = 239, input[1].shape[1] = 3)`. Running into this error is what led me to transposing in the first place. – David Diaz Mar 10 '17 at 18:25
  • aseyboldt is right you should care about Ni>observed and your probablity should be in [0,1] I guess. Also maybe you have outlier in your observed that make the convergence difficult. – Quantic Mar 15 '17 at 10:04
  • Probability is defined as a Beta-distributed random variable, which should keep it [0,1]. Haven't gotten back to re-running this yet, but am hopeful the starting values are the problem (per @aseyboldt), as these seem like solid leads. – David Diaz Mar 15 '17 at 14:05
0

I think you have several problems in your model:

  • Discrete parameters do not work with advi and nuts, and I doubt the metropolis sampler can deal with all those discrete parameters either. In many cases you can marginalize them out, but here you might want to use a continuous variable for the population size instead. Maybe something like this (which also takes care of the fact that population sizes lower than the observed number are impossible)

    sd = pm.HalfCauchy('Ni_sd', beta=2.5)
    trafo = pm.distributions.transforms.lowerbound(observations.max(axis=0))
    Ni = pm.Gamma('Ni', mu=lam, sd=sd, shape=num_transects,
                  transform=trafo, testval=observations.max(axis=0) + 1)
    
  • I think your model is unidentifiable: Couldn't you always just increase all population sizes and decrease prop? I can't see how this model could learn anything useful about prop. Where would this information come from?

  • Why would the population size depend on the survey length? Shouldn't this influence prop? Or is it an area?

aseyboldt
  • 1,090
  • 8
  • 14
  • In this field survey, "length" was how much time was spent at the plot counting birds, and longer survey length would presumably lead to larger counts, all else being equal. – David Diaz Mar 15 '17 at 14:12
  • The exponentiation of the linear model to calculate lambda should keep it positive and constrain Ni~Poisson(lambda) positive as well. I do think I should probably provide starting values for Ni that are larger than lambda, though. – David Diaz Mar 15 '17 at 14:18
  • Maybe I just don't understand the model, but I thought Ni are the true population sizes, of which you only observe a proportion prop. Then looking for a longer time should increase prop, not Ni. After all, the number of birds does not increase just because you look harder. But anyway, I don't see where information about prob should come from. If you had multiple observations for some animals, then I guess this could work (if you start seeing the same animals over and over, then you seem to have exhausted the population). – aseyboldt Mar 15 '17 at 16:47
  • Ups, I guess if length is the length of the trail then really the number of animals should be larger for longer trails. Sorry about that. – aseyboldt Mar 15 '17 at 16:48
  • Prob is the probability of successfully detecting each bird over Ni "tries", which is how it's being used in the Binomial distribution for Y_obs. In a more elaborate form of this kind of model, prob could vary based on local environmental covariates (e.g., forest cover %) or time of year. In the version I've posted, it would be constant for this species, and could be compared to estimates of prob for other species, for example. – David Diaz Mar 15 '17 at 17:21
  • I misspoke earlier about starting values for Ni... What I should've said is that I would need to make sure that Ni starts above maximum Y_obs. – David Diaz Mar 15 '17 at 17:46