I am attempting to compute the Jacobian of a TensorFlow neural network's outputs with respect to its inputs. This is easily achieved with the tf.GradientTape.jacobian
method. The trivial example provided in the TensorFlow documentation is as follows:
with tf.GradientTape() as g:
x = tf.constant([1.0, 2.0])
g.watch(x)
y = x * x
jacobian = g.jacobian(y, x)
This is fine if I want only want to compute the Jacobian of a single instance of the input tensor x
. However, I need to repeatedly evaluate this Jacobian many, many times for various instances of x
. For a non-trivial Jacobian calculation (e.g. for a deep convolutional neural network with non-linear activation functions), this is incredibly expensive to repeatedly rerun the GradientTape calculation and evaluate the jacobian
method. I know from the TensorFlow documentation that the gradients (and hence the Jacobian) are computed via automatic differentiation. I have to imagine there is some internal storage of the analytical gradient of the network (computed by automatic differentiation) which is evaluated at the given inputs.
My question: am I correct in assuming that TensorFlow builds and stores (at least parts of) the analytical gradients needed to compute the Jacobian? And if so, is there a way to save this analytical gradient and re-evaluate the Jacobian with new inputs without having to reconstruct it via the GradientTape method?
A "persistent" GradientTape does not seem to solve this issue: it only allows for the repeated evaluation of a single GradientTape instance with respect to multiple internal arguments of the computation.