I was trying to use the attention model described here in a simple bidirectional lstm model. However, after adding the attention model, I got this error:
ValueError: Unknown initializer: GlorotUniform
To begin with, my code didn't have any incompatibility issue in terms of using TensorFlow in some part and Keras in other parts of the code. I also tried every solution addressed in this post. However, none of them worked for me. I must mention that my code worked with no issues before adding the attention model. So, I tried removing every line of the attention part of the network structure to see what line is causing this problem:
inputs = tf.keras.layers.Input(shape=(n_timesteps, n_features))
units = 50
activations = tf.keras.layers.Bidirectional(tf.compat.v1.keras.layers.CuDNNLSTM(units,
return_sequences=True),
merge_mode='concat')(inputs)
print(np.shape(activations))
# Implementation of attention
x1 = tf.keras.layers.Dense(1, activation='tanh')(activations)
print(np.shape(x1))
x1= tf.keras.layers.Flatten()(x1)
print(np.shape(x1))
x1= tf.keras.layers.Activation('softmax')(x1)
print(np.shape(x1))
x1=tf.keras.layers.RepeatVector(units*2)(x1)
print(np.shape(x1))
x1 = tf.keras.layers.Permute([2,1])(x1)
print(np.shape(x1))
sent_representation = tf.keras.layers.Multiply()([activations, x1])
print(np.shape(sent_representation))
sent_representation = tf.keras.layers.Lambda(lambda xin:tf.keras.backend.sum(xin, axis=-2),
output_shape=(units*2,))(sent_representation)
# softmax for classification
x = tf.keras.layers.Dense(n_outputs, activation='softmax')(sent_representation)
model = tf.keras.models.Model(inputs=inputs, outputs=x)
I realized it is the line with lambda function and tf.keras.backend.sum that is causing the error. So, after some search I decided to replace that line with the following:
sent_representation = tf.math.reduce_sum(sent_representation, axis=-2)
Now, my code works. However, I am not quite sure if this substitution is correct. Am I doing this right?
Edit: Here is the next lines of the code, the problem is caused when I try to load the best model for testing:
optimizer = tf.keras.optimizers.SGD(lr=0.001, decay=1e-6, momentum=0.9)
model.compile(loss=lossFunction, optimizer=optimizer, metrics=['accuracy'])
print(model.summary())
# early stopping
es = tf.keras.callbacks.EarlyStopping(monitor='val_loss', mode='min',
verbose=1, patience=20)
mc = tf.keras.callbacks.ModelCheckpoint('best_model.h5',
monitor='val_accuracy', mode='max', verbose=1,
save_best_only=True)
history = model.fit(trainX, trainy, validation_data=(valX, valy),
shuffle = True, epochs=epochs, verbose=0,
callbacks=[es, mc])
saved_model = tf.keras.models.load_model('best_model.h5',
custom_objects={"GlorotUniform": tf.keras.initializers.glorot_uniform()})
# evaluate the model
_, train_acc = saved_model.evaluate(trainX, trainy, verbose=0) # saved_model
_, val_acc = saved_model.evaluate(valX, valy, verbose=0) # saved_model
_, accuracy = saved_model.evaluate(testX, testy, verbose=0) # saved_model
print('Train: %.3f, Validation: %.3f, Test: %.3f' % (train_acc, val_acc, accuracy))
y_pred = saved_model.predict(testX, batch_size=64, verbose=1)
Do you see any problem in my code that might be the cause of the error that I get when I use Lambda layer?