How to unfreeze layers from a densenet? (PyTorch)

Question

I'd like to perform fine-tuning of an entire block from DenseNet-161. At the moment, I know I can use the following to freeze all layers apart from the classifier:

model = models.densenet161(pretrained=True)
for param in model.parameters():
    param.requires_grad = False
num_ftrs = model.classifier.in_features
    
model.classifier = torch.nn.Linear(num_ftrs,2)

However, I'd like to unfreeze the last few layers/ block of the DenseNet for fine-tuning. What would be the best most elegant way of achieving this?

cherrywoods · Accepted Answer · 2022-12-10T13:28:00.613

1

First of all, you can also unfreeze the classifier by setting requires_grad of it's parameters to True.

for param in model.classifier.parameters():
    param.requires_grad = True

This way you keep the original parameters of that layer, instead of a new random initialization that you get when create a new nn.Linear.

That also works for any other submodule of the DenseNet. You can see which other modules there are by printing the module. To unfreeze the last block and the last BatchNorm, you can do

# this is a torch.nn.Sequential containing the 
# "denseblock4" and "norm5" submodules
submodules = model.features[-2:]  
for param in submodules.parameters():
    param.requires_grad = True

If you want to reset the parameters to a new random initialization, you can use some initializer from torch.nn.init on each parameter.

As requested in the comments: How to re-initialize the last two layers while keeping them frozen?

The last two layers contain convolutional layers and batch norm layers. While you probably want to reinitialize the convolutional layers randomly, this may not be what you want for the batch norm layers.

with torch.no_grad():  # allows to re-initialize the parameters
    submodules = model.features[-2:] 
    for submodule in submodules.modules():
        if isinstance(submodule, torch.nn.Conv2d):
            # randomly re-initialize the weights
            torch.nn.init.kaiming_normal_(submodule.weight)
            if submodule.bias is not None:
                # reset the bias to zero
                torch.nn.init.zeros_(submodule.bias)
        elif isinstance(submodule, torch.nn.BatchNorm2d):
            torch.nn.init.ones_(submodule.weight)
            torch.nn.init.zeros_(submodule.bias)
            # also reset running mean and running_var
            torch.nn.init.zeros_(submodule.running_mean)
            torch.nn.init.ones_(submodule.running_var)

We haven't frozen or un-frozen the parameters in this code. They retain the state they had initially. You can either freeze them before or afterwards using the usual procedure.

edited Dec 10 '22 at 13:28

answered Dec 09 '22 at 19:45

cherrywoods

1,284
1
7
18

Ah ok understood. The feature extraction tutorial on pytorch suggests what I produced above for the classifier, but this is good to know nonetheless. – Ze0ruso Dec 09 '22 at 23:33
great, thanks for the clear explanation, I noticed ```with torch.no_grad(): ```, does this mean that to get the desirable behaviour, we would have to perform this for every epoch? – Ze0ruso Dec 14 '22 at 15:14
1

You're welcome. No, I think you will have to re-initialize the parameters only once before training. Doing it every epoch would destroy the training progress. The `torch.no_grad()` only enables resetting the parameters. If you leave it out, PyTorch complains that you can't set parameters with gradient computation enabled. – cherrywoods Dec 14 '22 at 22:40
Out of curiosity why would we not want the batch norm layers set randomly? Is this due to wanting to keep the means zeroed and variance at 1 to prevent internal covariate shift? – Ze0ruso Dec 20 '22 at 22:06
1

My primary motivation was to recreate how PyTorch initialises BatchNorm layers. Setting he mean to 0 and the variance to 1 effectively turns normalisation off. I suppose this is a more reasonable starting point than using some random values. Regarding bias and weight of BatchNorm: the following layer has random parameters, so I suppose there is no need to initialise BatchNorm bias and weight randomly, but I think you could randomly initialise them just as well. – cherrywoods Dec 21 '22 at 09:36
Would Kaiming initialisation be deterministic and not truly random? or are we selecting random numbers from this distribution? And if so, would this be different for every layer that is generated via ```kaiming_normal_```? – Ze0ruso Feb 03 '23 at 18:11
1

By just calling `torch.nn.init.kaiming_normal_` multiple times, you can see that it outputs different values every time. The behaviour is deterministic and uses torch's default random number generator. You can test that using `torch.manual_seed`. For more details, please ask another question :) – cherrywoods Feb 04 '23 at 20:14

How to unfreeze layers from a densenet? (PyTorch)

1 Answers1