8

I am interested in computing the derivative of a matrix determinant using TensorFlow. I can see from experimentation that TensorFlow has not implemented a method of differentiating through a determinant:

LookupError: No gradient defined for operation 'MatrixDeterminant' 
(op type: MatrixDeterminant)

A little further investigation revealed that it is actually possible to compute the derivative; see for example Jacobi's formula. I determined that in order to implement this means of differentiating through a determinant that I need to use the function decorator,

@tf.RegisterGradient("MatrixDeterminant")
def _sub_grad(op, grad):
    ...

However, I am not familiar enough with tensor flow to understand how this can be accomplished. Does anyone have any insight on this matter?

Here's an example where I run into this issue:

x = tf.Variable(tf.ones(shape=[1]))
y = tf.Variable(tf.ones(shape=[1]))

A = tf.reshape(
    tf.pack([tf.sin(x), tf.zeros([1, ]), tf.zeros([1, ]), tf.cos(y)]), (2,2)
)
loss = tf.square(tf.matrix_determinant(A))


optimizer = tf.train.GradientDescentOptimizer(0.001)
train = optimizer.minimize(loss)

init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)


for step in xrange(100):
    sess.run(train)
    print sess.run(x)
user1936768
  • 659
  • 1
  • 9
  • 14

3 Answers3

9

Please check "Implement Gradient in Python" section here

In particular, you can implement it as follows

@ops.RegisterGradient("MatrixDeterminant")
def _MatrixDeterminantGrad(op, grad):
  """Gradient for MatrixDeterminant. Use formula from 2.2.4 from
  An extended collection of matrix derivative results for forward and reverse
  mode algorithmic differentiation by Mike Giles
  -- http://eprints.maths.ox.ac.uk/1079/1/NA-08-01.pdf
"""
  A = op.inputs[0]
  C = op.outputs[0]
  Ainv = tf.matrix_inverse(A)
  return grad*C*tf.transpose(Ainv)

Then a simple training loop to check that it works:

a0 = np.array([[1,2],[3,4]]).astype(np.float32)
a = tf.Variable(a0)
b = tf.square(tf.matrix_determinant(a))
init_op = tf.initialize_all_variables()
sess = tf.InteractiveSession()
init_op.run()

minimization_steps = 50
learning_rate = 0.001
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
train_op = optimizer.minimize(b)

losses = []
for i in range(minimization_steps):
  train_op.run()
  losses.append(b.eval())

Then you can visualize your loss over time

import matplotlib.pyplot as plt

plt.ylabel("Determinant Squared")
plt.xlabel("Iterations")
plt.plot(losses)

Should see something like this Loss plot

Yaroslav Bulatov
  • 57,332
  • 22
  • 139
  • 197
  • Very cool! for some reason the docs on tf are causing issues. eg: from the links above http://tensorflow.org/how_tos/adding_an_op/index.md#AUTOGENERATED-implement-the-gradient-in-python – Blaze Nov 18 '15 at 23:31
0

I think you are confused with what is a derivative of a matrix determinant.

Matrix determinant is a function which is calculated over the elements of the matrix by some formula. So if all the elements of the matrix are numbers, you the determinant will you you just one number and the derivative will be 0. When some of the elements are variables, you will get an expression of these variables. For example:

x, x^2
1, sin(x)

The determinant will be x*sin(x) - x^2 and the derivative is 2x + sin(x) + x*cos(x). The Jacobi formula just connects the determinant with adjunct matrix.


In your example your matrix A consists of only numbers and therefore the determinant is just a number and the loss is just a number as well. GradientDescentOptimizer needs to have some free variables to minimize and does not have any because your loss is just a number.

Salvador Dali
  • 214,103
  • 147
  • 703
  • 753
  • The real problem here is that MatrixDeterminant class does not provide a registered gradient. – user1936768 Nov 15 '15 at 01:09
  • @user1936768 yes this is a reason why you got the error in your python problem, but this is not a real reason. Assume the gradient method exists. It will always return you 0 no matter what. Will this be any help in your 100 iterations? How exactly will it minimize anything? – Salvador Dali Nov 15 '15 at 01:15
  • No the gradient will not be zero. I am minimizing with respect to x and y, and the matrix depends on x and y through sin and cos respectively. – user1936768 Nov 15 '15 at 01:19
  • @user1936768 look carefully at your example. `x` and `y` are matrices which consist of only numbers. – Salvador Dali Nov 15 '15 at 01:33
  • This isn't how TensorFlow works. Notice the Variable wrapper around x and y. Consider that if I make `loss = tf.square(tf.reduce_sum(tf.matrix_inverse(A)))` then I will indeed iterate towards an optimal solution `x = 1.57079494, y = 1.09198811e-18` which is not what I initialized `x` or `y` to. – user1936768 Nov 15 '15 at 01:36
  • At any rate, this is beside the point. My example was only an example. It is easy to imagine (more) situations where the matrix depends on variables. In these cases, the fact that MatrixDeterminant does not have a gradient presents a problem. – user1936768 Nov 15 '15 at 01:37
  • @Salvador, you're miss-understanding this. There is a method to calculate the determinant, a number, from the numbers in the input matrix. We're talking about the derivative of the determinant *operation* with respect to each element of the input matrix, evaluated at the current value of the input. What is the derivative of `x**2` when `x=5`? – mdaoust Nov 18 '15 at 17:50
0

For those who are interested, I discovered the solution that works on my problems:

@tf.RegisterGradient("MatrixDeterminant")
def _MatrixDeterminant(op, grad):
    """Gradient for MatrixDeterminant."""
    return op.outputs[0] * tf.transpose(tf.matrix_inverse(op.inputs[0]))
user1936768
  • 659
  • 1
  • 9
  • 14