Online tripet generation - am I doing it right?

Question

I'm trying to train a convolutional neural network with triplet loss (more about triplet loss here) in order to generate face embeddings (128 values that accurately describe a face).

In order to select only semi-hard triplets (distance(anchor, positive) < distance(anchor, negative)), I first feed all values in a mini-batch and calculate the distances:

distance1, distance2 = sess.run([d_pos, d_neg], feed_dict={x_anchor:input1, x_positive:input2, x_negative:input3})

Then I select the indices of the inputs with distances that respect the formula above:

valids_batch = compute_valids(distance1, distance2, batch_size)

The function compute_valids:

def compute_valids(distance1, distance2, batch_size):

    valids = list();

    for q in range(0, len(distance1)):

        if(distance1[q] < distance2[q]):

            valids.append(q)

    return valids;

Then I learn only from the training examples with indices returned by this filter function:

input1_valid = [input1[q] for q in valids_batch]
input2_valid = [input2[q] for q in valids_batch]
input3_valid = [input3[q] for q in valids_batch]

_, loss_value, summary = sess.run([optimizer, cost, summary_op], feed_dict={x_anchor:input1_valid, x_positive:input2_valid, x_negative:input3_valid})

Where optimizer is defined as:

model1 = siamese_convnet(x_anchor)

model2 = siamese_convnet(x_positive)

model3 = siamese_convnet(x_negative)

d_pos = tf.reduce_sum(tf.square(model1 - model2), 1)
d_neg = tf.reduce_sum(tf.square(model1 - model3), 1)

cost = triplet_loss(d_pos, d_neg)
optimizer = tf.train.AdamOptimizer(learning_rate = 1e-4).minimize( cost )

But something is wrong because accuracy is very low (50%).

What am I doing wrong?

you may try `SGD with momentum` optimizer with `learning_rate=0.001` — Ishant Mrinal, Aug 23 '17 at 12:45
If my answer solves your problem, consider [accepting it](https://stackoverflow.com/help/accepted-answer). — jdhao, Dec 04 '17 at 09:53

score 1 · Accepted Answer · answered Dec 04 '17 at 03:19

There are a lot of reasons why your network is performing poorly. From what I understand, your triplet generation method is fine. Here are some tips that may help improve your performance.

The model

In deep metric learning, people usually use some pre-trained models on ImageNet classification task as these models are pretty expressive and can generate good representation for image. You can fine-tuning your model on the basis of these pre-trained models, e.g., VGG16, GoogleNet, ResNet.

How to fine-tuing

Even if you have a good pre-trained model, it is often difficult to directly optimize the triplet loss using these model on your own dataset. Since these pre-trained models are trained on ImageNet, if your dataset is vastly different from ImageNet, you can first fine-tuning the model using classification task on your dataset. Once your model performs reasonably well on the classification task on your custom dataset, you can use the classification model as base network (maybe a little tweak) for triplet network. It will often lead to much better performance.

Hyper parameters

Hyper parameters such as learning rate, momentum, weight_decay etc. are also extremely important for good performance (learning rate maybe the most important factor). Since your are fine-tuning and not training the network from scratch. You should use a small learning rate, for example, lr=0.001 or lr=0.0001. For momentum, 0.9 is a good choice. For weight_decay, people usually use 0.0005 or 0.00005.

If you add some fully connected layers, then for these layers, the learning rate may be higher than other layers (0.01 for example).

Which layer to fine-tuing

As your network has several layers, you need to decide which layer to fine-tune. Researcher have found that the lower layers in network just produce some generic features such as line or edges. Typically, people will freeze the updating of lower layers and only update the weight of upper layers which tend to produce task-oriented features. You should try to optimize starting from different lower layers and see which setting performs best.

Reference

Fast rcnn(Section 4.5, which layers to fine-tune)
Deep image retrieval(section 5.2, Influence of fine-tuning the representation)

Can you please provide more information on how to freeze the update on a specific layer? Another section in your answer where I need help is how to use different learning rates for different layers. Your answer is great, I will accept it right now. — Hello Lili, Dec 04 '17 at 09:57
I think this is framework-based question and does not relate to the problem here. I am currently using `PyTorch`, as your post has `Tensorflow` tag, I am afraid you can search google or start a new question regarding how to achieve these on Tensorflow. — jdhao, Dec 04 '17 at 10:14

score 1 · Answer 2 · answered Mar 21 '18 at 11:41

distance(anchor, positive) < distance(anchor, negative)

This will select triplets in which similarity between anchor and positive is more than anchor and negative, it is opposite of hard triplet. You need to use examples where d(a,p)>d(a,n) for hard triplets. For semi-hard triplets, you need examples that satisfy d(a,p)<d(a,n)<d(a,p)+margin.

Here is the explanation : https://stackoverflow.com/a/49314187/7693521

I hope I am correct about this, if not please correct me.