I have a deep neural network made of a combination o modules, such as an encoder, a decoder, etc. Before training, I load a part of its parameters from a pretrained model, just for a subset of modules. For instance, I could load a pretrained encoder. Then I want to freeze the parameters of the pretrained modules so that they are not trained with the rest. In Pytorch:
for param in submodel.parameters()
param.requires_grad = False
Now, should I keep applying dropout to these freezed modules while learning or should I deactivate it (see example below) ? Why?
def MyModel(nn.Module):
...
def forward(x):
if freeze_submodule:
self.submodule.eval() # disable dropout when submodule is frozen
x = self._forward(x)
if freeze_submodule:
self.submodule.train()