4

I'm applying Siamese Bidirectional LSTM (BiLSTM) using character-level sequences and embeddings for long texts. The embeddings model is Word2vec, the sequence length is None to handle variable sequence lengths (180-550), the batch size is 8 and the model trained using Keras with TF backend for 100 epochs. The Manhattan distance is similarity measurement metric between left-side network and right-side network.

def manhattan_distance(left, right):
    return K.exp(-K.sum(K.abs(left - right), axis=1, keepdims=True))

Now, the evaluate.py invoke the h5 model file and output results in csv file. The problem is that the results are very different between first time that i invoked the file and second time for the same test data! How to generalize the results? for example if i get similarity score between 10.txt and 20.txt 90% for the first time, then i can get something near 90% for the seond/third/etc times?

P.S: Attached photo shows the first time results in the third column and the second time results in the fourth column. Results

Yoskutik
  • 1,859
  • 2
  • 17
  • 43
MManahi
  • 41
  • 3

0 Answers0