2

Tensorflow's implementation of AdamOptimzer do not have regularization params like that in ProximalAdamOptimizer, for example l2_regularization_strength, is it necessary to add l2 norm in AdamOptimzer?

cheng
  • 2,106
  • 6
  • 28
  • 36

2 Answers2

2

Tensorflows Adam implementation is just that: An implementation of Adam, exactly how it is defined and tested in the paper.

If you want to use Adam with L2 regularization for your problem you simply have to add an L2 regularization term to your loss with some regularization strength you can choose yourself.

I can't tell you if that is necessary or helpful or what regularization and regularization strength to use, because that highly depends on the problem and is rather subjective.

2

Usually you add the regularization to your loss yourself, like it is described here. However tf.train.ProximalAdagradOptimizer includes a special non-standard regularization which is part of the algorithm and therefore also part of tf.train.ProximalAdagradOptimizer.

BlueSun
  • 3,541
  • 1
  • 18
  • 37
  • can you share more about "special non-standard regularization", why regularization is part of this optimizer? the original paper is very long and too mathematical to understand. – cheng Apr 27 '18 at 02:24
  • @cheng the paper it not about why regularization is used. It is about how to regularize. Regularization is usually used to prevent overfitting. – BlueSun Apr 27 '18 at 10:35