37

My team is training a CNN in Tensorflow for binary classification of damaged/acceptable parts. We created our code by modifying the cifar10 example code. In my prior experience with Neural Networks, I always trained until the loss was very close to 0 (well below 1). However, we are now evaluating our model with a validation set during training (on a separate GPU), and it seems like the precision stopped increasing after about 6.7k steps, while the loss is still dropping steadily after over 40k steps. Is this due to overfitting? Should we expect to see another spike in accuracy once the loss is very close to zero? The current max accuracy is not acceptable. Should we kill it and keep tuning? What do you recommend? Here is our modified code and graphs of the training process.

https://gist.github.com/justineyster/6226535a8ee3f567e759c2ff2ae3776b

Precision and Loss Images

Justin Eyster
  • 939
  • 1
  • 8
  • 10

3 Answers3

53

A decrease in binary cross-entropy loss does not imply an increase in accuracy. Consider label 1, predictions 0.2, 0.4 and 0.6 at timesteps 1, 2, 3 and classification threshold 0.5. timesteps 1 and 2 will produce a decrease in loss but no increase in accuracy.

Ensure that your model has enough capacity by overfitting the training data. If the model is overfitting the training data, avoid overfitting by using regularization techniques such as dropout, L1 and L2 regularization and data augmentation.

Last, confirm your validation data and training data come from the same distribution.

Mehdi Nellen
  • 8,486
  • 4
  • 33
  • 48
rafaelvalle
  • 6,683
  • 3
  • 34
  • 36
  • 5
    Came to your answer after trying to find a NN on whole-black images, with 3 classes. The classifier learns to make the probability 33% for all classes LOL. So the loss decreases from 7 to 1, but the accuracy remains 33%! – Andrei Margeloiu May 31 '20 at 15:18
8

Well, I faced the similar situation when I used Softmax function in the last layer instead of Sigmoid for binary classification.

My validation loss and training loss were decreasing but accuracy of both remained constant. So this gave me lesson why sigmoid is used for binary classification.

coderina
  • 1,583
  • 13
  • 22
7

Here are my suggestions, one of the possible problems is that your network start to memorize data, yes you should increase regularization,

update: Here I want to mention one more problem that may cause this: The balance ratio in the validation set is much far away from what you have in the training set. I would recommend, at first step try to understand what is your test data (real-world data, the one your model will face in inference time) descriptive look like, what is its balance ratio, and other similar characteristics. Then try to build such a train/validation set almost with the same descriptive you achieve for real data.

Ali Abbasi
  • 894
  • 9
  • 22
  • Do you think adding more layers or dropout layers will help? – Justin Eyster Apr 19 '17 at 19:21
  • 1
    first apply dropout layers, if it doesn't make sense, then add more layers and more dropouts. also try to reduce your filter size and increase channels. – Ali Abbasi Apr 19 '17 at 19:27
  • 1
    Our images is only one channel (black and white). Could you explain more about increasing channels? Also do you think changing the number of filters will improve the accuracy as well? It's 256 currently. – Justin Eyster Apr 19 '17 at 19:31
  • 3
    every configuration in network parameters are just achieve by try and error, nobody can say changing the filters or layers or anything can improve your results, you should try all possible ways to reach your goal accuracy, – Ali Abbasi Apr 19 '17 at 19:38