PyTorch training with dropout and/or batch-normalization

Question

A model should be set in the evaluation mode for inference by calling model.eval().
Do we need to also do this during training before getting the model outputs? Like within a training epoch if the network contains one or more dropout and/or batch-normalization layers.

If this is not done then the output of the forward pass in the training epoch might be affected by the randomness in the dropout?

Many example codes do not do this and something along these lines is the common approach:

for t in range(num_epochs):
    # forward pass
    yhat = model(x)
  
    # get the loss
    loss = criterion(yhat , y)
    
    # backward pass, optimizer step
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

For example here is an example code to look at : convolutional_neural_network/main.py

Should this instead be?

for t in range(num_epochs):
    # forward pass
    model.eval() # disable dropout etc
    yhat = model(x)
    
    # get the loss
    loss = criterion(yhat , y)
    
    # backward pass, optimizer step
    model.train()
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

Hossein · Accepted Answer · 2020-07-30T10:36:50.257

2

TLDR:

Should this instead be?

No!

Why?

More explanation:
Different Modules behave differently depending on whether they are in training or evaluation/test mode.
BatchNorm and Dropout are only two examples of such modules, basically any module that has a training phase follows this rule.
When you do .eval(), you are signaling all modules in the model to shift operations accordingly.

Update
The answer is during training you should not use eval mode and yes, as long as you have not set the eval mode, the dropout will be active and act randomly in each forward passes. Similarly all other modules that have two phases, will perform accordingly. That is BN will always update the mean/var for each pass, and also if you use batch_size of 1, it will error out as it can not do BN with batch of 1

As it was pointed out in comments, it should be noted that during training, you should not do eval() before the forward pass, as it effectively disables all modules that has different phases for train/test mode such as BN and Dropout (basically any module that has updateable/learnable parameters, or impacts network topology like dropout) will be disabled and you will not see them contributing to your network learning. So don't code like that!

Let me explain a bit what happens during training:
When you are in training mode, all of your modules that make up your model may have two modes, training and test mode. These modules either have learnable parameters that need to be updated during training, like BN, or affect network topology in a sense like Dropout (by disabling some features during forward pass). some modules such as ReLU() only operate in one mode and thus do not have any change when modes change.
When you are in training mode, you feed an image, it passes trough layers until it faces a dropout and here, some features are disabled, thus theri responses to the next layer is omitted, the output goes to other layers until it reaches the end of the network and you get a prediction.

the network may have correct or wrong predictions, which will accordingly update the weights. if the answer was right, the features/combinations of features that resulted in the correct answer will be positively affected and vice versa. So during training you do not need and should not disable dropout, as it affects the output and should be affecting it so that the model learns a better set of features.

I hope this makes it a bit more clear for you. if you still feel you need more, say so in the comments.

edited Jul 30 '20 at 10:36

answered Jul 30 '20 at 06:49

Hossein

24,202
35
119
224

Thanks, that is clear but that does not really answer the question. How to or even is it necessary to set a model to evaluation mode to get the outputs of the network during training? – Krrr Jul 30 '20 at 06:51
If you want to get the outputs, you are not obliged to be specifically in eval mode. thats mostly a training concept. the thing is, you set it so you know how your model has 'learned' and performs accordingly. not that you can not get any outputs out of your network in training mode. if thats not what you meant please be a bit more clear – Hossein Jul 30 '20 at 06:54
I have edited the question to make is clearer (hopefully)! – Krrr Jul 30 '20 at 06:59
1

@DataD'oh , the answer is yes, as long as you have not set the eval mode, the dropout will be active and act randomly in each forward passes. similarly all other modules that have two phases, will perform accordingly. that is BN will always update the mean/var for each pass, and also if you use batch_size of 1, it will error out as it can not do BN with batch of 1. – Hossein Jul 30 '20 at 07:24
Thanks, if you update your answer I will accept it. – Krrr Jul 30 '20 at 08:02
here is an example where BN is used but wiithin a training epoch `model.eval()` is not set: https://github.com/deeplearningzerotoall/PyTorch/blob/master/CNN/lab-10-6-mnist_batchnorm.py – Krrr Jul 30 '20 at 08:14
During training eval is not set for training purposes (updating params etc). if you look closely you can see that, after the training batches are finished, it goes into eval mode and then runs a series of batches and then for the next round of training, the model again goes into training mode and so on. also note that in that example, they are actually evaluating the model on the `training set`, *after* it has been *trained*. and then they run an eval on test set. – Hossein Jul 30 '20 at 09:44
Correct, but that's exactly the question and I show example code clarifying this. Why does it make sense to keep e.g. dropout when getting outputs even on (part of) the training set? – Krrr Jul 30 '20 at 09:46
I think you should state in your answer that the proposed approach in the question is wrong as `model.eval()` disables dropouts and updating of batch norm, so literally, if one use `eval()` before forward, then BN and will be never updated and dropout will be omitted. The whole idea of using dropout is to force NN to generate different outputs in each iteration to work as an regulizer. – M. Doosti Lakhani Jul 30 '20 at 09:50
1

@DataD'oh, M.Doosti Lakhani, explained it well, dropout existence matter during training, that is when you are feed forwarding, some features are disabled then your network has to make up for it, if you disable dropout during training, whats the use of dropout then? nothing as it gets disabled. – Hossein Jul 30 '20 at 10:13
1

@M.DoostiLakhani Thanks for the note. I thought its evident and self explanatory. but anyways I added a note to the answer – Hossein Jul 30 '20 at 10:19
@DataD'oh I added a bit more explanation, does this make it any more clear? – Hossein Jul 30 '20 at 10:29
1

@Rika Thank you for update. The reason I asked for more clarification was that the person who asked the question thought that `model.eval` is the correct way! https://discuss.pytorch.org/t/should-one-set-model-eval-when-getting-outputs-in-a-training-epoch/91069 – M. Doosti Lakhani Jul 30 '20 at 12:32
@DataD'oh no problem, glad your confusion is over. – Hossein Jul 30 '20 at 15:47
1

@M.DoostiLakhani Thanks a lot. you actually did the right thing, The initial answer was indeed vague – Hossein Jul 30 '20 at 15:55

PyTorch training with dropout and/or batch-normalization

1 Answers1