I'm trying to use Theano to compute the hessian of a function with respect to a vector as well as a couple scalars (edit: that is, I essentially want the scalars appended to the vector that I am computing the hessian with respect to). Here's a minimal example:
import theano
import theano.tensor as T
A = T.vector('A')
b,c = T.scalars('b','c')
y = T.sum(A)*b*c
My first try was:
hy = T.hessian(y,[A,b,c])
Which fails with AssertionError: tensor.hessian expects a (list of) 1 dimensional variable as 'wrt'
My second try was to combine A, b, and c with:
wrt = T.concatenate([A,T.stack(b,c)])
hy = T.hessian(y,[wrt])
Which fails with DisconnectedInputError: grad method was asked to compute the gradient with respect to a variable that is not part of the computational graph of the cost, or is used only by a non-differentiable operator: Join.0
What is the correct way to compute the hessian in this case?
Update: To clarify on what I am looking for, suppose A is a 2 element vector. Then the Hessian would be:
[[d2y/d2A1, d2y/dA1dA2, d2y/dA1dB, d2y/dA1dC],
[d2y/dA2dA1, d2y/d2A2, d2y/dA2dB, d2y/dA2dC],
[d2y/dBdA1, d2y/dBdA2, d2y/d2B, d2y/dABdC],
[d2y/dCdA1, d2y/dCdA2, d2y/dCdB, d2y/d2C]]
which for the example function y
should be:
[[0, 0, C, B],
[0, 0, C, B],
[C, C, 0, A1+A2],
[B, B, A1+A2, 0]]
So if we were to define a function:
f = theano.function([A,b,c], hy)
then, assuming we could compute hy
successfully, we would expect the output:
f([1,1], 4, 5) =
[[0, 0, 5, 4],
[0, 0, 5, 4],
[5, 5, 0, 2],
[4, 4, 2, 0]]
In my actual application, A has 25 elements and y
is more complicated, but the idea is the same.