Imagine a tiny network defined as follows, where linear is a typical helper function defining TensorFlow variables for a weight matrix and activation function:
final_layer = linear(linear(_input,10,tf.nn.tanh),20)
Normally this would be optimized via gradient descent on a loss:
loss = tf.reduce_sum(tf.square(final_layer - _target))
train_step = tf.train.AdamOptimizer().minimmize(loss)
But assume I'm getting the derivatives of the loss w.r.t. final_layer from an external source (e.g. a tf.placeholder named _deriv). How can I use this gradient information with one of the builtin optimizers to backpropagate and update the network parameters?
The workaround I'm currently using is to construct an artificial loss consisting of the inner product between _deriv and final_layer (since the derivatives of this loss w.r.t. final_layer will be equal to _deriv).
loss = tf.reduce_sum(final_layer*_deriv)
train_step = tf.train.AdamOptimizer().minimmize(loss)
This is very wasteful though, as it needs to do this unnecessary inner product and calculate its derivative for every training step even though I already know this information. Is there a better way?
For those thinking this an odd thing to need to do, it is necessary for implementing synthetic gradients.