Am I using tf.math.reduce_sum in the attention model in the right way?

Question

I was trying to use the attention model described here in a simple bidirectional lstm model. However, after adding the attention model, I got this error:

ValueError: Unknown initializer: GlorotUniform

To begin with, my code didn't have any incompatibility issue in terms of using TensorFlow in some part and Keras in other parts of the code. I also tried every solution addressed in this post. However, none of them worked for me. I must mention that my code worked with no issues before adding the attention model. So, I tried removing every line of the attention part of the network structure to see what line is causing this problem:

inputs = tf.keras.layers.Input(shape=(n_timesteps, n_features))
units = 50
activations = tf.keras.layers.Bidirectional(tf.compat.v1.keras.layers.CuDNNLSTM(units,
                                                                       return_sequences=True), 
                                                                       merge_mode='concat')(inputs)
print(np.shape(activations))

# Implementation of attention
x1 = tf.keras.layers.Dense(1, activation='tanh')(activations)
print(np.shape(x1))
x1= tf.keras.layers.Flatten()(x1)
print(np.shape(x1))
x1= tf.keras.layers.Activation('softmax')(x1)
print(np.shape(x1))
x1=tf.keras.layers.RepeatVector(units*2)(x1)  
print(np.shape(x1))
x1 = tf.keras.layers.Permute([2,1])(x1)
print(np.shape(x1))
sent_representation = tf.keras.layers.Multiply()([activations, x1])
print(np.shape(sent_representation))
sent_representation = tf.keras.layers.Lambda(lambda xin:tf.keras.backend.sum(xin, axis=-2),                                    
output_shape=(units*2,))(sent_representation)

# softmax for classification
x = tf.keras.layers.Dense(n_outputs, activation='softmax')(sent_representation)
model = tf.keras.models.Model(inputs=inputs, outputs=x)

I realized it is the line with lambda function and tf.keras.backend.sum that is causing the error. So, after some search I decided to replace that line with the following:

sent_representation = tf.math.reduce_sum(sent_representation, axis=-2)

Now, my code works. However, I am not quite sure if this substitution is correct. Am I doing this right?

Edit: Here is the next lines of the code, the problem is caused when I try to load the best model for testing:

optimizer = tf.keras.optimizers.SGD(lr=0.001, decay=1e-6, momentum=0.9)   
model.compile(loss=lossFunction, optimizer=optimizer, metrics=['accuracy'])
print(model.summary())

# early stopping
es = tf.keras.callbacks.EarlyStopping(monitor='val_loss', mode='min', 
verbose=1, patience=20)
mc = tf.keras.callbacks.ModelCheckpoint('best_model.h5', 
monitor='val_accuracy', mode='max', verbose=1, 
save_best_only=True)
history = model.fit(trainX, trainy, validation_data=(valX, valy),
                    shuffle = True, epochs=epochs, verbose=0,
                    callbacks=[es, mc])  
saved_model =  tf.keras.models.load_model('best_model.h5',
                                          custom_objects={"GlorotUniform": tf.keras.initializers.glorot_uniform()})
# evaluate the model
_, train_acc = saved_model.evaluate(trainX, trainy, verbose=0)  # saved_model
_, val_acc = saved_model.evaluate(valX, valy, verbose=0)  # saved_model
_, accuracy = saved_model.evaluate(testX, testy, verbose=0)  # saved_model
print('Train: %.3f, Validation: %.3f, Test: %.3f' % (train_acc, val_acc, accuracy))
y_pred = saved_model.predict(testX, batch_size=64, verbose=1)

Do you see any problem in my code that might be the cause of the error that I get when I use Lambda layer?

Marco Cerliani · Accepted Answer · 2020-08-25T07:38:52.287

1

The code you provided works for me without problem with tf.keras.backend.sum and with tf.math.reduce_sum

The answer is that your substitution doesn't alter your network or what you are you looking for. You can test it on your own and verify that tf.keras.backend.sum is equal to tf.math.reduce_sum

X = np.random.uniform(0,1, (32,100,10)).astype('float32')

(tf.keras.backend.sum(X, axis=-2) == tf.reduce_sum(X, axis=-2)).numpy().all() # TRUE

I also suggest you to wrap the operation with a Lambda layer

EDIT: the usage of tf.reduce_sum or tf.keras.backend.sum, wrapped in a Lambda layer, don't raise error if using a TF version >= 2.2.

In the model building, you need to use layers only. If you want to use some tensorflow ops (like tf.reduce_sum or tf.keras.backend.sum) you need to wrap them in keras Lambda layer. Without this the model can still work but using Lambda is a good practice in order to avoid future problem

edited Aug 25 '20 at 07:38

answered Aug 24 '20 at 22:08

Marco Cerliani

21,233
3
49
54

Do you mean something like this: sent_representation = tf.keras.layers.Lambda(lambda xin: tf.math.reduce_sum(xin, axis=-2), output_shape=(units*2,))(sent_representation). Because I get the same error. So, I guess the problem is with Lambda layer. Any idea how to fix this? – Miranda Aug 24 '20 at 22:22
I don't have any problem: https://colab.research.google.com/drive/1uT9oPhkS8ygEg0f1XGZk5XT7nOoHU-Kx?usp=sharing – Marco Cerliani Aug 24 '20 at 22:32
as before no problems, also after saving and loading models https://colab.research.google.com/drive/1asLmi54MXnMHtZKd7TPYdZ7H-xtNxMcv?usp=sharing – Marco Cerliani Aug 24 '20 at 22:56
Can you make a guess about the cause of this problem? – Miranda Aug 24 '20 at 22:58
but what is the error that raises using Lambda layer? – Marco Cerliani Aug 24 '20 at 23:01
ValueError: Unknown initializer: GlorotUniform – Miranda Aug 24 '20 at 23:04
if u do a trial with a simpe tf.keras.layer:LSTM instead of tf.compat.v1.keras.layers.CuDNNLSTM but it seems strange – Marco Cerliani Aug 24 '20 at 23:12
I did try tf.keras.layers.LSTM as well and get the exact same error – Miranda Aug 24 '20 at 23:16
which is your tf version ? – Marco Cerliani Aug 24 '20 at 23:20
tensorflow version: 2.1.0 – Miranda Aug 24 '20 at 23:22
I downgrade on colab on 2.1 and it raises an error (not the same as yours)... the last thing I can suggest u is to upgrade to 2.3 (in my example it doesn't raise errors) – Marco Cerliani Aug 24 '20 at 23:37
I will try that and will let you know of the result – Miranda Aug 24 '20 at 23:47
I upgraded to tensorflow 2.2 (I was using conda and 2.3 is not still available) and problem resolved. Can you please add version upgrade to your answer, so that I can accept it as the answer? – Miranda Aug 25 '20 at 02:29
Also can you add something to your answer about why to wrap the operation with a Lambda layer? Is it gonna cause any issue if it is not applied? – Miranda Aug 25 '20 at 02:31
Nice to hear about this... I edited... don't forget to upvote and accept it ;-) – Marco Cerliani Aug 25 '20 at 07:39

Am I using tf.math.reduce_sum in the attention model in the right way?

1 Answers1