2

How to find the dirichlet priors using pymc3?

I've tried the following:

import pymc3 as pm
import numpy as np

population = [139212, 70192, 50000, 21000, 16000, 5000, 2000, 500, 600, 100, 10, 5, 5, 5, 5]

with pm.Model() as model:
    zipfy = pm.Dirichlet('zipfy', a=np.array([1.]), observed=population)
    tr = pm.sample(100)

but it threw a ValueError:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-14-623c21a4f35f> in <module>()
      1 with pm.Model() as model:
      2     zipfy = pm.Dirichlet('zipfy', a=np.array([1.]), observed=population)
----> 3     tr = pm.sample(100)

/Users/liling.tan/Library/Python/3.5/lib/python/site-packages/pymc3/sampling.py in sample(draws, step, init, n_init, start, trace, chain, njobs, tune, progressbar, model, random_seed)
    147         # By default, use NUTS sampler
    148         pm._log.info('Auto-assigning NUTS sampler...')
--> 149         start_, step = init_nuts(init=init, n_init=n_init, model=model)
    150         if start is None:
    151             start = start_

/Users/liling.tan/Library/Python/3.5/lib/python/site-packages/pymc3/sampling.py in init_nuts(init, n_init, model, **kwargs)
    432 
    433     if init == 'advi':
--> 434         v_params = pm.variational.advi(n=n_init)
    435         start = pm.variational.sample_vp(v_params, 1, progressbar=False, hide_transformed=False)[0]
    436         cov = np.power(model.dict_to_array(v_params.stds), 2)

/Users/liling.tan/Library/Python/3.5/lib/python/site-packages/pymc3/variational/advi.py in advi(vars, start, model, n, accurate_elbo, optimizer, learning_rate, epsilon, random_seed)
    122     # Create variational gradient tensor
    123     elbo, shared = _calc_elbo(vars, model, n_mcsamples=n_mcsamples,
--> 124                               random_seed=random_seed)
    125 
    126     # Set starting values

/Users/liling.tan/Library/Python/3.5/lib/python/site-packages/pymc3/variational/advi.py in _calc_elbo(vars, model, n_mcsamples, random_seed)
    179     logpt = tt.add(*map(tt.sum, factors))
    180 
--> 181     [logp], inarray = pm.join_nonshared_inputs([logpt], vars, shared)
    182 
    183     uw = tt.vector('uw')

/Users/liling.tan/Library/Python/3.5/lib/python/site-packages/pymc3/theanof.py in join_nonshared_inputs(xs, vars, shared, make_shared)
    180     inarray : vector of inputs
    181     """
--> 182     joined = tt.concatenate([var.ravel() for var in vars])
    183 
    184     if not make_shared:

/Users/liling.tan/Library/Python/3.5/lib/python/site-packages/theano/tensor/basic.py in concatenate(tensor_list, axis)
   4608             "or a list, make sure you did not forget () or [] around "
   4609             "arguments of concatenate.", tensor_list)
-> 4610     return join(axis, *tensor_list)
   4611 
   4612 

/Users/liling.tan/Library/Python/3.5/lib/python/site-packages/theano/tensor/basic.py in join(axis, *tensors_list)
   4357         return tensors_list[0]
   4358     else:
-> 4359         return join_(axis, *tensors_list)
   4360 
   4361 

/Users/liling.tan/Library/Python/3.5/lib/python/site-packages/theano/gof/op.py in __call__(self, *inputs, **kwargs)
    613         """
    614         return_list = kwargs.pop('return_list', False)
--> 615         node = self.make_node(*inputs, **kwargs)
    616 
    617         if config.compute_test_value != 'off':

/Users/liling.tan/Library/Python/3.5/lib/python/site-packages/theano/tensor/basic.py in make_node(self, *axis_and_tensors)
   4080         axis, tensors = axis_and_tensors[0], axis_and_tensors[1:]
   4081         if not tensors:
-> 4082             raise ValueError('Cannot join an empty list of tensors')
   4083         as_tensor_variable_args = [as_tensor_variable(x) for x in tensors]
   4084 

ValueError: Cannot join an empty list of tensors

Edited

zipfy is supposed to be the parameter vector.

I've tried this:

import pymc3 as pm
import numpy as np
population = [139212, 70192, 50000, 21000, 16000, 5000, 2000, 500, 600, 100, 10, 5, 5, 5, 5]
with pm.Model() as model:
    zipfy = pm.Dirichlet('zipfy', a=np.array(population))
    tr = pm.sample(100)

print (tr['zipfy'])
print (len(tr['zipfy']), len(tr['zipfy'][0]) )

[out]:

array([[  4.57466959e-01,   2.30024576e-01,   1.63655813e-01, ...,
      2.79491587e-05,   1.15471055e-05,   1.21639409e-05],
   [  4.57466959e-01,   2.30024576e-01,   1.63655813e-01, ...,
      2.79491587e-05,   1.15471055e-05,   1.21639409e-05],
   [  4.57550769e-01,   2.30182026e-01,   1.63985544e-01, ...,
      1.61401840e-05,   2.90679821e-05,   2.47148304e-05],
   ..., 
   [  4.56878341e-01,   2.31362382e-01,   1.63956669e-01, ...,
      1.04361219e-05,   5.42454872e-06,   2.51727193e-05],
   [  4.57542706e-01,   2.30122065e-01,   1.63973662e-01, ...,
      2.16784018e-05,   5.42709076e-06,   7.35651589e-06],
   [  4.56698065e-01,   2.30786537e-01,   1.64125287e-01, ...,
      7.14659176e-06,   2.90901488e-05,   1.87413211e-05]])

   (100, 15)

I was expecting the a dirichlet priors (i.e. parameters array) to be of size 1 but it's of size 100. Is that the expected behavior? How should the output of trace['zipfy'] be interpreted? Ah, the trace is the steps from the pm.sample(100)? And the priors are inside the model object?

So given a discrete value, let's say 5, how do I find the dirichlet priors that I've just learnt from the sampler? Is it inside the model object or the zipfy object?

alvas
  • 115,346
  • 109
  • 446
  • 738
  • I'm not sure what exactly you are trying to do. Your model does not have any parameters at all, that's why pymc3 throws an error. Also, Dirichlet is a distribution over a simplex, so the observed values don't make sense. If population is supposed to be the parameter vector $\alpha$, you could use `zipfy = pm.Dirichlet('zipfy', a=np.array(population))`. – aseyboldt Mar 23 '17 at 18:49
  • @aseyboldt the context of the problem is something like this: https://twitter.com/alvations/status/844778923711852546 . Yes, zipfy is supposed to be a parameter vector. I've updated the question. – alvas Mar 24 '17 at 00:38
  • More specifically: https://twitter.com/AlxCoventry/status/844794108879134720 – alvas Mar 24 '17 at 01:04
  • 1
    I still don't understand what you mean. Can you maybe specify what data you observe, what parameters you are trying to estimate and what prior knowledge you have about those parameters? – aseyboldt Mar 26 '17 at 19:00

0 Answers0