2

I am relatively new in this domain. Currently I have three models:

  1. Model #1: training from scratch but using the googlenet NN architectures
  2. Model #2: transfer learning (finetuning), only use the last layer of googlenet and then use the googlenet model as the initial weight
  3. Model #3: transfer learning (finetuning), use all the layers and use the googlenet model as the initial weight

Model #2 and #3 showed a promising validation accuracy (about 79%-80%), however, when I tested using the testing set (30000 photos), the performance of both models were poor (error rate: 95%) while Model #1 achieved 70% of accuracy using the same test set.

I am afraid, due to lack of knowledge in this domain, that I’ve done something wrong when finetuning the existing model (googlenet). Following the guidance I got from quora and stackoverflow, I tried to modify the train.prototxt. In order to get the Model #2, what I did was set all the lr_mult and decay_mult to 0 (zero) except the last layer (in my case would be the fc layer). In order to get the Model #3, I used all the layers (only modify the last layer and change the num_ouput of the last layer) Then I trained the model using the following command:

/root/caffe/build/tools/caffe train --solver my_solver.prototxt -weights googlenet_places365.caffemodel --gpu=0,1,2,3

My questions are:

  1. Are the steps which I mentioned above (for getting model #2 and #3) correct? Did I miss some important steps?
  2. Why the accuracy using the validation set when training the model showed the high value but performed really poor when using the test data (overfit)? What can be the root causes of this overfit?

FYI, I have 4millions photos which divided into three sets: training sets (contains 70% of the photos), validation set (20% of the photos) and testing set (10% of the photos).

Thank you very much, I would really glad to hear any guide, comment or information from you.

Community
  • 1
  • 1
bohr
  • 631
  • 2
  • 9
  • 29

0 Answers0