5

I'm trying to use Theano to compute the hessian of a function with respect to a vector as well as a couple scalars (edit: that is, I essentially want the scalars appended to the vector that I am computing the hessian with respect to). Here's a minimal example:

import theano
import theano.tensor as T
A = T.vector('A')
b,c = T.scalars('b','c')
y = T.sum(A)*b*c

My first try was:

hy = T.hessian(y,[A,b,c])

Which fails with AssertionError: tensor.hessian expects a (list of) 1 dimensional variable as 'wrt'

My second try was to combine A, b, and c with:

wrt = T.concatenate([A,T.stack(b,c)])
hy = T.hessian(y,[wrt])

Which fails with DisconnectedInputError: grad method was asked to compute the gradient with respect to a variable that is not part of the computational graph of the cost, or is used only by a non-differentiable operator: Join.0

What is the correct way to compute the hessian in this case?

Update: To clarify on what I am looking for, suppose A is a 2 element vector. Then the Hessian would be:

[[d2y/d2A1, d2y/dA1dA2, d2y/dA1dB, d2y/dA1dC],
[d2y/dA2dA1, d2y/d2A2, d2y/dA2dB, d2y/dA2dC],
[d2y/dBdA1, d2y/dBdA2, d2y/d2B, d2y/dABdC],
[d2y/dCdA1, d2y/dCdA2, d2y/dCdB, d2y/d2C]]

which for the example function y should be:

[[0, 0, C, B],
[0, 0, C, B],
[C, C, 0, A1+A2],
[B, B, A1+A2, 0]]

So if we were to define a function:

f = theano.function([A,b,c], hy)

then, assuming we could compute hy successfully, we would expect the output:

f([1,1], 4, 5) = 
    [[0, 0, 5, 4],
    [0, 0, 5, 4],
    [5, 5, 0, 2],
    [4, 4, 2, 0]]

In my actual application, A has 25 elements and y is more complicated, but the idea is the same.

Amir
  • 10,600
  • 9
  • 48
  • 75
  • The hessian is only defined for scalar variables and not for vectors. What you probably want is to compute the hessian for each element of the vector. Do you know the length of A? Then you could try hy = T.hessian(y,[A[0],A[1],A[2],b,c]) if A has the length of 3. – Randrian Dec 13 '15 at 19:03
  • Yes, that is exactly what I am trying to do, not compute the hessian with respect to scalars but rather concatenate those scalars onto the vector that I am using to compute the hessian. Your example doesn't work though, it has the same issue as my first attempt where hessian expects all the wrt elements to be vectors rather than scalars, whereas the behavior I am expecting is to treat the scalars as components of a vector. – Kevin Zielnicki Dec 13 '15 at 19:43
  • Perhaps you should write the whole code of the function. It could help to figure out what you want to do and a minimal working example helps others to try out solutions. – Randrian Dec 13 '15 at 20:57
  • I added some additional details on what kind of result I am expecting, hopefully that helps clarify my question. – Kevin Zielnicki Dec 14 '15 at 06:57

2 Answers2

1

If you pass b,c as vectors, it should work. The hessian operator expects 1D arrays. Even though scalars should work, too, it is probably easiest to just provide the type of input it likes.

The reason why your stacking fails is that the stack operation yields a new, non-endnode variable on a different branch of the graph with respect to which you can't generally take derivatives explicitly. So theano simply doesn't permit this.

This works for me:

import theano.tensor as T
A = T.vector('A')
b,c = T.vectors('b','c')
y = T.sum(A)*b[0]*c[0]

hy = T.hessian(y,[A,b,c])
eickenberg
  • 14,152
  • 1
  • 48
  • 52
  • Thanks for the reply! This doesn't quite do what I was hoping for unfortunately: `f = theano.function([A,b,c], hy)`; `f([1,1],[4],[5]) = [array([[ 0., 0.], [ 0., 0.]]), array([[ 0.]]), array([[ 0.]])]` Rather than computing one hessian with all of the terms, it computes 3 hessians. – Kevin Zielnicki Dec 14 '15 at 17:25
  • Hmm OK. And concatenating the input on the numpy level is impossible? – eickenberg Dec 14 '15 at 17:29
  • Hmm, that might work but would be pretty awkward. Is there no way to join variables without making them into new variables that you can't take derivatives of? – Kevin Zielnicki Dec 14 '15 at 17:33
1

Based on a suggestion from @eickenberg to combine the inputs at the numpy level, I used the following workaround:

import theano
import theano.tensor as T

A,temp = T.vectors('A','T')
b,c = T.scalars('b','c')

y = T.sum(A)*b*c
y2 = theano.clone(y,{A:temp[:-2],b:temp[-2],c:temp[-1]})

hy = T.hessian(y2,[temp])
f = theano.function([temp], hy)

f([1,1,4,5])

gives the expected output:

> [array([[ 0.,  0.,  5.,  4.],
>         [ 0.,  0.,  5.,  4.],
>         [ 5.,  5.,  0.,  2.],
>         [ 4.,  4.,  2.,  0.]])]

This works but feels rather awkward, if anyone knows of a better (more general) solution please let me know.