0

I have trained a sequential model in keras, with sparse vectors as inputs (padded_inputs_multil for training and padded_inputs_tr for testing) and dense vectors as output (target_multil_array for training and target_tr_r_array for testing):

model_mul=keras.models.Sequential()
model_mul.add(keras.layers.LSTM(units=172, batch_input_shape=(None, 29, 22), dropout=0.2, recurrent_dropout=0.2, return_sequences=False))
model_mul.add(Dense(300, activation='tanh')) 

model_mul.compile(loss='cosine_similarity', optimizer='adam', metrics=[tf.keras.metrics.CosineSimilarity(axis=1)])
model_mul.summary()

history_mul=model_mul.fit(padded_inputs_multil, target_multil_array, epochs=1, validation_data=(padded_inputs_tr, target_tr_r_array))

And I get a cosine similarity of .4607, in the following output:

Train on 794870 samples, validate on 199108 samples
Epoch 1/1
794870/794870 [==============================] - 2694s 3ms/step - loss: -0.4678 - cosine_similarity: 0.4522 - 
val_loss: -0.4152 - val_cosine_similarity: 0.4607

However, when I evaluate the model, I get a lower value of cosine similarity:

results_mul = model_mul.evaluate(padded_inputs_tr, target_tr_r_array)
print(results_mul)
[-0.4152175833690755, 0.44675499200820923]

Then, the worse problem: if I compute the predicted vectors and compare them with the target vectors, I get a mean cosine similarity that is much, much lower (slightly higher than .40). I can't understand why, since on the tensorflow documentation I find that CosineSimilarity keeps the average cosine similarity between predictions and labels.

prediction_mul = model_mul.predict(padded_inputs_tr)
column_names = ['prediction_multil', 'target_multil', 'cos_pred_target']
df = pd.DataFrame(columns = column_names)
df['prediction_multil'] = [vec for vec in prediction_mul]
df['target_multil'] = [vec for vec in target_tr_r_array]

def cos_sim(a, b):
    dot_product = np.dot(a, b)
    norm_a = np.linalg.norm(a)
    norm_b = np.linalg.norm(b)
    return dot_product / (norm_a * norm_b)

cos = []
for index, row in df.iterrows():
    # print(cos_sim(row['prediction_multil'], row['target_multil']))
    cos.append(cos_sim(row['prediction_multil'], row['target_multil']))
df['cos_pred_target'] = [value for value in cos]
statistics.mean(df['cos_pred_target'])

Do you know what I might be doing wrong? Thanks in advance :)

a_gdevr
  • 93
  • 7
  • Hi there! In Keras, "cosine_similarity" [loss](https://www.tensorflow.org/api_docs/python/tf/keras/losses/cosine_similarity) should converge to -1? On the other hand, "CosineSimilarity" [metric](https://www.tensorflow.org/api_docs/python/tf/keras/metrics/CosineSimilarity) should go towards 1, right? – isydmr Jun 09 '20 at 07:55
  • @isydmr Yes, but what surprised me was not the discrepancy between the loss and the metrics, but by the discrepancy between the value of the metric when fitting and evaluating the model, and when computing the mean of the cosine similarity between target and predicted vector – a_gdevr Jun 12 '20 at 09:06

1 Answers1

0

According to the cosine_similarity documentation, the default axis value is axis=-1

and as answered here axis=-1 means it will take the last shape axis. So for your case axis=-1 is equivalent to axis=2.

So the result is that the loss in computing in one axis, and the metrics on another:

model_mul.compile(loss='cosine_similarity', optimizer='adam', metrics=[tf.keras.metrics.CosineSimilarity(axis=1)])