Here is a short review of triplet learning. I'm using three convolutional neural networks with shared weights in order to generate faces embeddings (anchor, positive, negative), with the loss described here.
Triplet loss:
anchor_output = ... # shape [None, 128]
positive_output = ... # shape [None, 128]
negative_output = ... # shape [None, 128]
d_pos = tf.reduce_sum(tf.square(anchor_output - positive_output), 1)
d_neg = tf.reduce_sum(tf.square(anchor_output - negative_output), 1)
loss = tf.maximum(0., margin + d_pos - d_neg)
loss = tf.reduce_mean(loss)
When I select only the hard triplets (distance(anchor, positive) < distance(anchor, negative)
), the loss is very small: 0.08.
When I select all triplets, the loss becomes bigger 0.17855. These are just test values for 10 000 triplet pairs, but I get similar results on the actual set (600 000 triplet pairs).
Why does this happen? Is it correct?
I'm using SGD with momentum, starting with learning rate 0.001.