I'm applying Siamese Bidirectional LSTM (BiLSTM) using character-level sequences and embeddings for long texts. The embeddings model is Word2vec, the sequence length is None to handle variable sequence lengths (180-550), the batch size is 8 and the model trained using Keras with TF backend for 100 epochs. The Manhattan distance is similarity measurement metric between left-side network and right-side network.
def manhattan_distance(left, right):
return K.exp(-K.sum(K.abs(left - right), axis=1, keepdims=True))
Now, the evaluate.py invoke the h5 model file and output results in csv file. The problem is that the results are very different between first time that i invoked the file and second time for the same test data! How to generalize the results? for example if i get similarity score between 10.txt and 20.txt 90% for the first time, then i can get something near 90% for the seond/third/etc times?
P.S: Attached photo shows the first time results in the third column and the second time results in the fourth column. Results