I've trained simple GRU with attention layer and now I'm trying to visualize attention weights (I've already got them). Input is 2 one-hot encoded sequences (one is correct, the other is almost the same but has permutations of letters). The task is to define which one of the sequences is correct. Here's my NN:
optimizer = keras.optimizers.RMSprop()
max_features = 4 #number of words in the dictionary
num_classes = 2
model = keras.Sequential()
model.add(GRU(128, input_shape=(70, max_features), return_sequences=True, activation='tanh'))
model.add(Dropout(0.5))
atn_layer = model.add(SeqSelfAttention())
model.add(Flatten())
model.add(Dense(num_classes, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer=optimizer,
metrics=['accuracy'])
I've tried several things found on StackOverflow but didn't succeed. The thing particularly is that I don't understand how to couple my input and attention weights. I'd appreciate any help and suggestions.