1

related to How to Create CaffeDB training data for siamese networks out of image directory

If I have N labels. How can I enforce, that the feature vector of size N right before the contrastive loss layer represents some kind of probability for each class? Or comes that automatically with the siamese net design?

Community
  • 1
  • 1
Feuerteufel
  • 571
  • 5
  • 16

1 Answers1

2

If you only use contrastive loss in a Siamese network, there is no way of forcing the net to classify into the correct label - because the net is only trained using "same/not same" information and does not know the semantics of the different classes.

What you can do is train with multiple loss layers.
You should aim at training a feature representation that is reach enough for your domain, so that looking at the trained feature vector of some input (in some high dimension) you should be able to easily classify that input to the correct class. Moreover, given that feature representation of two inputs one should be able to easily say if they are "same" or "not same".
Therefore, I recommend that you train your deep network with two loss layer with "bottom" as the output of one of the "InnerProduct" layers. One loss is the contrastive loss. The other loss should have another "InnerProduct" layer with num_output: N and a "SoftmaxWithLoss" layer.

A similar concept was used in this work: Sun, Chen, Wang and Tang Deep Learning Face Representation by Joint Identification-Verification NIPS 2014.

Shai
  • 111,146
  • 38
  • 238
  • 371
  • So I have some kind of fork in my net, ending up in those two loss layer? For example I have fc7 as input for contrastive loss and another fc8 as input for softmax loss and fc8 itself has fc7 as input. – Feuerteufel Jan 25 '16 at 13:19
  • 1
    @Feuerteufel exactly! note that you need to add `loss_weight` parameter to each of the loss layers - they can have different effect on the overall loss. You can give preference to the contrastive loss, or vice versa. – Shai Jan 25 '16 at 13:21
  • Does it has an (dis-)advantage giving both the same weight? In the end I want to have a good classifier. And could both loss layer have fc8 as input or is it necessary that I have an additional layer before going to the SoftMax-layer? – Feuerteufel Jan 25 '16 at 14:52
  • 1
    @Feuerteufel (1) regarding the `loss_weight` I guess you'll have to try and see if there is one loss that dominates the process... (2) It is more general to train the contrastive loss on `fc7`, but I can see it working on `fc8` as well. I guess you'll have to try and see. It would be nice if you could come back here and post your insights to share back with us. – Shai Jan 25 '16 at 14:57
  • I generated some CaffeDB's but when I try to run, it crashes on creating the python layer. I activated WITH_PYTHON_LAYER := 1 in the Makefile.config and rebuild it. Do I need to do something else? – Feuerteufel Jan 28 '16 at 08:55
  • @Feuerteufel "it crashes" - where? why? what error message you got? I suppose it can be asked in a anew question... – Shai Jan 28 '16 at 09:19
  • 1
    got it, forgot a make clean before rebuild of caffe, now it's working. – Feuerteufel Jan 28 '16 at 09:39