53

I want to set some of my model frozen. Following the official docs:

with torch.no_grad():
    linear = nn.Linear(1, 1)
    linear.eval()
    print(linear.weight.requires_grad)

But it prints True instead of False. If I want to set the model in eval mode, what should I do?

benjaminplanche
  • 14,689
  • 5
  • 57
  • 69
Qian Wang
  • 764
  • 2
  • 7
  • 13
  • [The documentation](https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html#convnet-as-fixed-feature-extractor) has a simple example of fine-tuning that should help you. – jodag Aug 08 '18 at 16:53

5 Answers5

98

requires_grad=False

If you want to freeze part of your model and train the rest, you can set requires_grad of the parameters you want to freeze to False.

For example, if you only want to keep the convolutional part of VGG16 fixed:

model = torchvision.models.vgg16(pretrained=True)
for param in model.features.parameters():
    param.requires_grad = False

By switching the requires_grad flags to False, no intermediate buffers will be saved, until the computation gets to some point where one of the inputs of the operation requires the gradient.

torch.no_grad()

Using the context manager torch.no_grad is a different way to achieve that goal: in the no_grad context, all the results of the computations will have requires_grad=False, even if the inputs have requires_grad=True. Notice that you won't be able to backpropagate the gradient to layers before the no_grad. For example:

x = torch.randn(2, 2)
x.requires_grad = True

lin0 = nn.Linear(2, 2)
lin1 = nn.Linear(2, 2)
lin2 = nn.Linear(2, 2)
x1 = lin0(x)
with torch.no_grad():    
    x2 = lin1(x1)
x3 = lin2(x2)
x3.sum().backward()
print(lin0.weight.grad, lin1.weight.grad, lin2.weight.grad)

outputs:

(None, None, tensor([[-1.4481, -1.1789],
         [-1.4481, -1.1789]]))

Here lin1.weight.requires_grad was True, but the gradient wasn't computed because the oepration was done in the no_grad context.

model.eval()

If your goal is not to finetune, but to set your model in inference mode, the most convenient way is to use the torch.no_grad context manager. In this case you also have to set your model to evaluation mode, this is achieved by calling eval() on the nn.Module, for example:

model = torchvision.models.vgg16(pretrained=True)
model.eval()

This operation sets the attribute self.training of the layers to False, in practice this will change the behavior of operations like Dropout or BatchNorm that must behave differently at training and test time.

iacolippo
  • 4,133
  • 25
  • 37
  • Thanks for your explanation. Is there any efficiency difference between `torch.no_grad()` context manager and `t.requires_grad=False`, especially when it comes to memory efficiency? As you mentioned before, `t.requires_grad=False`, *no intermediate buffers will be saved*, will it be more efficient? – Qian Wang Aug 09 '18 at 09:09
  • 1
    They are equivalent – iacolippo Aug 09 '18 at 10:05
6

To complete @Salih_Karagoz's answer, you also have the torch.set_grad_enabled() context (further documentation here), which can be used to easily switch between train/eval modes:

linear = nn.Linear(1,1)

is_train = False

for param in linear.parameters():
    param.requires_grad = is_train
with torch.set_grad_enabled(is_train):
    linear.eval()
    print(linear.weight.requires_grad)
benjaminplanche
  • 14,689
  • 5
  • 57
  • 69
5

Here is the way;

linear = nn.Linear(1,1)

for param in linear.parameters():
    param.requires_grad = False

with torch.no_grad():
    linear.eval()
    print(linear.weight.requires_grad)

OUTPUT: False

Salih Karagoz
  • 2,189
  • 2
  • 22
  • 35
  • this is in fact false: `linear.weight.requires_grad=True` and will remain `True` despite being in context manager `torch.no_grad()`. Pytorch will just ignore this vlaue during `torch.no_grad`, not change it. – Carol Eisen May 17 '23 at 23:50
1

Nice. The trick is to check that when you define a Linear layar, by default the parameters will have requires_grad=True, because we would like to learn, right?

l = nn.Linear(1, 1)
p = l.parameters()
for _ in p:
    print (_)

# Parameter containing:
# tensor([[-0.3258]], requires_grad=True)
# Parameter containing:
# tensor([0.6040], requires_grad=True)    

The other construct,

with torch.no_grad():

Means you cannot learn in here.

So your code, just shows you are capable of learning, even though you are in torch.no_grad() where learning is forbidden.

with torch.no_grad():
    linear = nn.Linear(1, 1)
    linear.eval()
    print(linear.weight.requires_grad) #true

If you really plan to turn off requires_grad for the weight parameter, you can do it also with:

linear.weight.requires_grad_(False)

or

linear.weight.requires_grad = False

So your code may become like this:

with torch.no_grad():
    linear = nn.Linear(1, 1)
    linear.weight.requires_grad_(False)
    linear.eval()
    print(linear.weight.requires_grad)

If you plan to switch to requires_grad for all params in a module:

l = nn.Linear(1, 1)
for _ in l.parameters():
    _.requires_grad_(False)
    print(_)
prosti
  • 42,291
  • 14
  • 186
  • 151
0

This tutorial may help.

In short words, I think a good way for this question could be:

linear = nn.Linear(1,1)

for param in linear.parameters():
    param.requires_grad = False

linear.eval()
print(linear.weight.requires_grad)

Meiqi
  • 1
  • 1
  • How is this different from / expanding over the older answers? – dedObed Mar 28 '19 at 23:16
  • @dedObed I summarized some different use cases in the tutorial for giving a precise answer to this question. The accepted answer is very good. I included the .detach() which is quite related to this question in the tutorial. – Meiqi Mar 30 '19 at 01:09