35

I know that it is possible to freeze single layers in a network for example to train only the last layers of a pre-trained model. What I’m looking for is a way to apply certain learning rates to different layers.

So for example a very low learning rate of 0.000001 for the first layer and then increasing the learning rate gradually for each of the following layers. So that the last layer then ends up with a learning rate of 0.01 or so.

Is this possible in pytorch? Any idea how I can archive this?

MBT
  • 21,733
  • 19
  • 84
  • 102

1 Answers1

68

Here is the solution:

from torch.optim import Adam

model = Net()

optim = Adam(
    [
        {"params": model.fc.parameters(), "lr": 1e-3},
        {"params": model.agroupoflayer.parameters()},
        {"params": model.lastlayer.parameters(), "lr": 4e-2},
    ],
    lr=5e-4,
)

Other parameters that are didn't specify in optimizer will not optimize. So you should state all layers or groups(OR the layers you want to optimize). and if you didn't specify the learning rate it will take the global learning rate(5e-4). The trick is when you create the model you should give names to the layers or you can group it.

Salih Karagoz
  • 2,189
  • 2
  • 22
  • 35
  • 2
    Great, exactly what I was looking for - Thank you! – MBT Aug 11 '18 at 17:56
  • 1
    @salih-karagoz If you also add some references and sources with your answer, that will be a great help as well. – thanatoz Jul 13 '22 at 08:39
  • 2
    Just wanted to [add the observation](https://discuss.pytorch.org/t/different-learning-rate-for-a-specific-layer/33670/7?u=carbocation) that if you are using a custom scheduler (e.g., OneCycleLR), you will need to tell the scheduler about these learning rates. – carbocation Oct 27 '22 at 21:37