Why my CNN based on Alexnet fails in classification?

Question

I'm trying to build a CNN to classify dogs.In fact , my data set consists of 5 classes of dogs. I've 50 images of dogs splitted into 40 images for training and 10 for testing. I've trained my network based on AlexNet pretrained model over 100,000 and 140,000 iterations but the accuracy is always between 20 % and 30 %. In fact, I have adapted AlexNet to my problem as following : I changed the name of last fully connected network and num_output to 5. Also , I ve changed the name of the first fully connected layer (fc6).

So why this model failed even I' ve used data augmentation (cropping )?

Should I use a linear classification on top layer of my network since I have a little bit of data and similar to AlexNet dataset ( as mentioned here transfer learning) or my data set is very different of original data set of AlexNet and I should train linear classifier in earlier network ?

Here is my solver :

net: "models/mymodel/train_val.prototxt"
test_iter: 1000
test_interval: 1000
base_lr: 0.01
lr_policy: "step"
gamma: 0.1
stepsize: 100000
display: 20
max_iter: 200000
momentum: 0.9
weight_decay: 0.0005
snapshot: 1000
snapshot_prefix: "models/mymodel/my_model_alex_net_train"
solver_mode: GPU

What is your batch size? What are the training parameters of your **solver** file? There are many other considerations, but those are a good start. — Prune, Apr 11 '17 at 18:09

Prune · Accepted Answer · 2017-04-11T21:29:00.250

1

Although you haven't given us much debugging information, I suspect that you've done some serious over-fitting. In general, a model's "sweet spot" for training is dependent on epochs, not iterations. Single-node AlexNet and GoogleNet, on an ILSVRC-style of data base, train in 50-90 epochs. Even if your batch size is as small as 1, you've trained for 2500 epochs with only 5 classes. With only 8 images per class, the AlexNet topology is serious overkill and is likely adapted to each individual photo.

Consider this: you have only 40 training photos, but 96 kernels in the first convolution layer and 256 in the second. This means that your model can spend over 2 kernels in conv1 and 6 in conv 2 for each photograph! You get no commonality of features, no averaging ... instead of edge detection generalizing to finding faces, you're going to have dedicated filters tuned to the individual photos.

In short, your model is trained to find "Aunt Polly's dog on a green throw rug in front of the kitchen cabinet with a patch of sun to the left." It doesn't have to learn to discriminate a basenji from a basset, just to recognize whatever is randomly convenient in each photo.

edited Apr 11 '17 at 21:29

answered Apr 11 '17 at 18:21

Prune

76,765
14
60
81

Would you explain what is the difference between epochs and iterations and how to fix number of epochs ? – user7417788 Apr 12 '17 at 11:50
Here, let me search that for you ... [answer](http://stackoverflow.com/questions/4752626/epoch-vs-iteration-when-training-neural-networks#31842945). You change the number of iterations in your solver.prototxt file. – Prune Apr 12 '17 at 20:26
Thank you for your help , so the solution of my problem is to have more data ? – user7417788 Apr 13 '17 at 09:26
More data is one possibility; there is no *one* solution. A best fit solution depends on your application and paradigm. – Prune Apr 13 '17 at 15:19

Why my CNN based on Alexnet fails in classification?

1 Answers1