0

I'm trying to define a complex custom likelihood function using pymc3. The likelihood function involves a lot of iteration, and therefore I'm trying to use theano's scan method to define iteration directly within theano. Here's a greatly simplified example that illustrates the challenge that I'm facing. The (fake) likelihood function I'm trying to define is simply the sum of two pymc3 random variables, p and theta. Of course, I could simply return p+theta, but the actual likelihood function I'm trying to write is more complicated, and I believe I need to use theano.scan since it involves a lot of iteration.

import pymc3 as pm
from pymc3 import Model, Uniform, DensityDist
import theano.tensor as T
import theano
import numpy as np


### theano test
theano.config.compute_test_value = 'raise'

X = np.asarray([[1.0,2.0,3.0],[1.0,2.0,3.0]])
### pymc3 implementation
with Model() as bg_model:

    p = pm.Uniform('p', lower = 0, upper = 1)
    theta = pm.Uniform('theta', lower = 0, upper = .2)

    def logp(X):
        f = p+theta
        print("f",f)
        get_ll = theano.function(name='get_ll',inputs = [p, theta], outputs = f)
        print("p keys ",p.__dict__.keys())
        print("theta keys ",theta.__dict__.keys())
        print("p name ",p.name,"p.type ",p.type,"type(p)",type(p),"p.tag",p.tag)
        result=get_ll(p, theta)
        print("result",result)
        return result

    y = pm.DensityDist('y', logp, observed = X) # Nx4 y = f(f,x,tx,n | p, theta)

When I run this, I get the error:

TypeError: ('Bad input argument to theano function with name "get_ll"  at index 0(0-based)', 'Expected an array-like object, but found a Variable: maybe you are trying to call a function on a (possibly shared) variable instead of a numeric array?')

I understand that the issue occurs in line result=get_ll(p, theta)

because p and theta are of type pymc3.TransformedRV, and that the input to a theano function needs to be a scalar number of a simple numpy array. However, a pymc3 TransformedRV does not seem to have any obvious way of obtaining the current value of the random variable itself.

Is it possible to define a log likelihood function that involves the use of a theano function that takes as input a pymc3 random variable?

Chris Jones
  • 151
  • 3
  • Have you already searched for examples in the [PyMC3 repository](https://github.com/pymc-devs/pymc3/search?q=scan&type=Code&utf8=%E2%9C%93)? – aloctavodia Sep 08 '16 at 07:13

1 Answers1

1

The problem is that your th.function get_ll is a compiled theano function, which takes as input numerical arrays. Instead, pymc3 is sending it a symbolic variable (theano tensor). That's why you're getting the error.

As to your solution, you're right in saying that just returning p+theta is the way to go. If you have scans and whatnot in your logp, then you would return the scan variable of interest; there is no need to compile a theano function here. For example, if you wanted to add 1 to each element of a vector (as an impractical toy example), you would do:

def logp(X):
    the_sum, the_sum_upd = th.scan(lambda x: x+1, sequences=[X])
    return the_sum

That being said, if you need gradients, you would need to calculate your the_sum variable in a theano Op and provide a grad() method along with it (you can see a toy example of that on the answer here). If you do not need gradients, you might be better off doing everything in python (or C, numba, cython, for performance) and using the as_op decorator.

Community
  • 1
  • 1