I have developed a paraphrase detection model (Yes/ No) which gets two phrases as input and is supposed to return whether it's paraphrased version or not.
Based on suggestions here I ensured that there is no class imbalance in training dataset:
This is my model:
left_input = Input(shape=(120, ))
right_input = Input(shape=(120, ))
left_embedding = Embedding(vocab_size, 120, input_length=max_length)(left_input)
right_embedding = Embedding(vocab_size, 120, input_length=max_length)(right_input)
left_lstm = LSTM(120, input_shape=(1, 120))(left_embedding)
right_lstm = LSTM(120, input_shape=(1, 120))(right_embedding)
concat = concatenate([left_lstm, right_lstm], name='Concatenate')
model_output = Dense(1, activation='softmax')(concat)
model = Model(inputs=[left_input, right_input], outputs=model_output, name='Final_output')
model.compile(optimizer='adam', loss='binary_crossentropy')
model.summary()
These are the predictions my model made:
Can anybody point out what is the problem here?
Update-1 Bu replacing "softmax" with "sigmoid", I get following values (all same):