Using Gradient Tape for Jacobian of LSTM model - Python

Question

I am building a sequence to one model prediction using LSTM. My data has 4 input variables and 1 output variable which needs to be predicted. The data is a time series data. The total length of the data is 38265 (total number of timesteps). The total data is in a Data Frame of size 38265 *5

I want to use the previous 20 timesteps data of the 4 input variables to make prediction of my output variable. I am using the below code for this purpose.

model = Sequential()

model.add(LSTM(units = 120, activation ='relu', return_sequences = False,input_shape = 
(train_in.shape[1],5)))
model.add(Dense(100,activation='relu'))
model.add(Dense(50,activation='relu'))

model.add(Dense(1))

I want to calculate the Jacobian of the output variable w.r.t the LSTM model function using tf.Gradient Tape .. Can anyone help me out with this??

score 1 · Accepted Answer · answered Feb 22 '21 at 19:37

The solution to segregate the Jacobian of the output with respect to the LSTM input can be done as follows:

Using tf.GradientTape(), we can compute the Jacobian arising from the gradient flow.
However for getting the Jacobian , the input needs to be in the form of tf.EagerTensor which is usually available when we want to see the Jacobian of the output (after executing y=model(x)). The following code snippet shares this idea:

#Get the Jacobian for each persistent gradient evaluation
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(2,activation='relu'))
model.add(tf.keras.layers.Dense(2,activation='relu'))
x = tf.constant([[5., 6., 3.]])

with tf.GradientTape(persistent=True,watch_accessed_variables=True) as tape:
  # Forward pass
  tape.watch(x)
  y = model(x)
  loss = tf.reduce_mean(y**2)
print('Gradients\n')
jacobian_wrt_loss=tape.jacobian(loss,x)
print(f'{jacobian_wrt_loss}\n')
jacobian_wrt_y=tape.jacobian(y,x)
print(f'{jacobian_wrt_y}\n')

But for getting intermediate outputs ,such as in this case, there have been many samples which use Keras. When we separate the outputs coming out from model.layers.output, we get the type to be a Keras.Tensor instead of an EagerTensor. However for creating the Jacobian, we need the Eager Tensor. (After many failed attempts with @tf.function wrapping as eager execution is already present in TF>2.0)
So alternatively, an auxiliary model can be created with the layers required (in this case, just the Input and LSTM layers).The output of this model will be a tf.EagerTensor which will be useful for the Jacobian tensor creation. The following has been shown in this snippet:

#General Syntax for getting jacobians for each layer output
import numpy as np
import tensorflow as tf
tf.executing_eagerly()
x=tf.constant([[15., 60., 32.]])
x_inp = tf.keras.layers.Input(tensor=tf.constant([[15., 60., 32.]]))
model=tf.keras.Sequential()
model.add(tf.keras.layers.Dense(2,activation='relu',name='dense_1'))
model.add(tf.keras.layers.Dense(2,activation='relu',name='dense_2'))

aux_model=tf.keras.Sequential()
aux_model.add(tf.keras.layers.Dense(2,activation='relu',name='dense_1'))
#model.compile(loss='sparse_categorical_crossentropy',optimizer='adam',metrics=['accuracy'])

with tf.GradientTape(persistent=True,watch_accessed_variables=True) as tape:
  # Forward pass
  tape.watch(x)
  x_y = model(x)
  act_y=aux_model(x)
  print(x_y,type(x_y))
  ops=[layer.output for layer in model.layers]
    
# ops=[layer.output for layer in model.layers]
# inps=[layer.input for layer in model.layers]
print('Jacobian of Full FFNN\n')
jacobian=tape.jacobian(x_y,x)
print(f'{jacobian[0]}\n')

print('Jacobian of FFNN with Just first Dense\n')
jacobian=tape.jacobian(act_y,x)
print(f'{jacobian[0]}\n')

Here I have used a simple FFNN consisting of 2 Dense layers, but I want to evaluate w.r.t the output of the first Dense layer. Hence I created an auxiliary model having just 1 Dense layer and determined the output of the Jacobian from it.

The details can be found here.

Hello @Abhilash Majumder , Thank you so much. It helped me understand and solve my issue. :) — Pradyumna TK, Feb 26 '21 at 03:08

score 1 · Answer 2 · answered Feb 26 '21 at 03:11

With the help from @Abhilash Majumder, I have done it this way. I am posting it here so that it might help someone in the future. import numpy as np import pandas as pd import tensorflow as tf tf.compat.v1.enable_eager_execution() #This will enable eager execution which is must.

tf.executing_eagerly() #check if eager execution is enabled or not. Should give "True"

data = pd.read_excel("FileName or Location ")
#My data is in the from of dataframe with 127549 rows and 5 columns(127549*5)

a = data[:20]  #shape is (20,5)
b = data[50:70] # shape is (20,5)
A = [a,b]  # making a list
A = np.array(A) # convert into array size (2,20,5) 

At = tf.convert_to_tensor(A, np.float32) #convert into tensor
At.shape # TensorShape([Dimension(2), Dimension(20), Dimension(5)])

model = load_model('EKF-LSTM-1.h5') # Load the trained model
# I have a trained model which is shown in the question above. 
# Output of this model is a single value

with tf.GradientTape(persistent=True,watch_accessed_variables=True) as tape:

tape.watch(At)
y1 = model(At) #defining your output as a function of input variables
print(y1,type(y1)

#output 
tf.Tensor([[0.04251503],[0.04634088]], shape=(2, 1), dtype=float32) <class 
'tensorflow.python.framework.ops.EagerTensor'>

jacobian=tape.jacobian(y1,At) #jacobian of output w.r.t both inputs
jacobian.shape

Outupt

TensorShape([Dimension(2), Dimension(1), Dimension(2), Dimension(20), Dimension(5)])

Here I calculated Jacobian w.r.t 2 inputs each of size (20,5). If you want to calculate w.r.t to only one input of size (20,5), then use this

jacobian=tape.jacobian(y1,At[0]) #jacobian of output w.r.t only 1st input in 'At'
jacobian.shape

Output

TensorShape([Dimension(1), Dimension(1), Dimension(1), Dimension(20), Dimension(5)])

score 0 · Answer 3 · answered Aug 31 '21 at 15:04

For those looking to compute the Jacobian over a series of inputs and outputs that are independent of each other for input[i], output[j], i != j, consider the batch_jacobian method.

This will reduce the number of dimensions in your computed Jacobian tensor by one and could be the difference between running out of memory and not.

See: batch_jacobian in the TensorFlow GradientTape docs.

Using Gradient Tape for Jacobian of LSTM model - Python

3 Answers3

Outupt

Output