1

There are several optimizers in training of neural network. But the Momentum and SGD seem always better than adaptive methods.

Now I am writing a program in tensorflow to reproduce the results of others. They use momentum to train in pylearn2. But there are several parameters: momentum factor, weight scale, bias scale. They assign the weight scale as the weight of dropout layers.

When I train my network I use Momentum. However, the result seems too hard to train, and the loss is always high. The result seems not bad when I use adam to train, but the result is worse than his in 0.00X.

I want to know how to tune Momentum optimizer. And I also want to know the reason why my program doesn't work well.

Shai
  • 111,146
  • 38
  • 238
  • 371
Kai He
  • 48
  • 8

0 Answers0