26

When training an Object Detection DNN with Tensorflows Object Detection API it's Visualization Plattform Tensorboard plots a scalar named regularization_loss_1

What is this? I know what regularization is (to make the Network good at generalizing through various methods like dropout) But it is not clear to me what this displayed loss could be.

Thanks!

gustavz
  • 2,964
  • 3
  • 25
  • 47

1 Answers1

22

TL;DR: it's just the additional loss generated by the regularization function. Add that to the network's loss and optimize over the sum of the two.

As you correctly state, regularization methods are used to help an optimization method to generalize better. A way to obtain this is to add a regularization term to the loss function. This term is a generic function, which modifies the "global" loss (as in, the sum of the network loss and the regularization loss) in order to drive the optimization algorithm in desired directions.

Let's say, for example, that for whatever reason I want to encourage solutions to the optimization that have weights as close to zero as possible. One approach, then, is to add to the loss produced by the network, a function of the network weights (for example, a scaled-down sum of all the absolute values of the weights). Since the optimization algorithm minimizes the global loss, my regularization term (which is high when the weights are far from zero) will push the optimization towards solutions tht have weights close to zero.

GPhilo
  • 18,519
  • 9
  • 63
  • 89
  • 1
    Why would one want weights close to zero? This information might improve the answer – Hakaishin Dec 13 '18 at 16:29
  • I only used that as an example of possible loss that is not directly related to the input data. I have no idea whether having weights close to zero is a desirable thing. – GPhilo Dec 13 '18 at 16:49
  • 8
    Weights close to zero lead to activations that are closer to the trigger boundaries of neurons (the slope of a sigmoid or a relu) and away from the saturated ends. This in turn makes your network less capable of producing highly non-linear decision boundaries making it less likely to overfit and more able to generalize, but also less likely to capture very complex patterns – Francois Zard May 19 '19 at 12:25
  • 2
    It was a simple strawman regularization term. He might have equally said "suppose I want to encourage solutions close to 42, because that is the answer to life, etc...". The form of the loss function is not the point of the question or the answer. – welch Jan 28 '20 at 21:53
  • 2
    On my TensorFlow Object Detection training I got regularization loss always increasing. Why this is happening? Any idea how to interpret this? – hafiz031 Jul 20 '20 at 00:15
  • I know it's an old post but, what would you say is the interpretation of the regularization loss being reduced on each epoch?. Could that be a sign of overfitting? – Typo Aug 13 '23 at 15:34