29

This is the model I defined it is a simple lstm with 2 fully connect layers.

import copy
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

class mylstm(nn.Module):
    def __init__(self,input_dim, output_dim, hidden_dim,linear_dim):
        super(mylstm, self).__init__()
        self.hidden_dim=hidden_dim
        self.lstm=nn.LSTMCell(input_dim,self.hidden_dim)
        self.linear1=nn.Linear(hidden_dim,linear_dim)
        self.linear2=nn.Linear(linear_dim,output_dim)
    def forward(self, input):
        out,_=self.lstm(input)
        out=nn.Dropout(p=0.3)(out)
        out=self.linear1(out)
        out=nn.Dropout(p=0.3)(out)
        out=self.linear2(out)
        return out

x_train and x_val are float dataframe with shape (4478,30), while y_train and y_val are float df with shape (4478,10)

    x_train.head()
Out[271]: 
       0       1       2       3    ...        26      27      28      29
0  1.6110  1.6100  1.6293  1.6370   ...    1.6870  1.6925  1.6950  1.6905
1  1.6100  1.6293  1.6370  1.6530   ...    1.6925  1.6950  1.6905  1.6960
2  1.6293  1.6370  1.6530  1.6537   ...    1.6950  1.6905  1.6960  1.6930
3  1.6370  1.6530  1.6537  1.6620   ...    1.6905  1.6960  1.6930  1.6955
4  1.6530  1.6537  1.6620  1.6568   ...    1.6960  1.6930  1.6955  1.7040

[5 rows x 30 columns]

x_train.shape
Out[272]: (4478, 30)

Define the varible and do one time bp, I can find out the vaildation loss is 1.4941

model=mylstm(30,10,200,100).double()
from torch import optim
optimizer=optim.RMSprop(model.parameters(), lr=0.001, alpha=0.9)
criterion=nn.L1Loss()
input_=torch.autograd.Variable(torch.from_numpy(np.array(x_train)))
target=torch.autograd.Variable(torch.from_numpy(np.array(y_train)))
input2_=torch.autograd.Variable(torch.from_numpy(np.array(x_val)))
target2=torch.autograd.Variable(torch.from_numpy(np.array(y_val)))
optimizer.zero_grad()
output=model(input_)
loss=criterion(output,target)
loss.backward()
optimizer.step()
moniter=criterion(model(input2_),target2)

moniter
Out[274]: tensor(1.4941, dtype=torch.float64, grad_fn=<L1LossBackward>)

But I called forward function again I get a different number due to randomness of dropout

moniter=criterion(model(input2_),target2)
moniter
Out[275]: tensor(1.4943, dtype=torch.float64, grad_fn=<L1LossBackward>)

what should I do that I can eliminate all the dropout in predicting phrase?

I tried eval():

moniter=criterion(model.eval()(input2_),target2)
moniter
Out[282]: tensor(1.4942, dtype=torch.float64, grad_fn=<L1LossBackward>)

moniter=criterion(model.eval()(input2_),target2)
moniter
Out[283]: tensor(1.4945, dtype=torch.float64, grad_fn=<L1LossBackward>)

And pass an addtional parameter p to control dropout:

import copy
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
class mylstm(nn.Module):
    def __init__(self,input_dim, output_dim, hidden_dim,linear_dim,p):
        super(mylstm, self).__init__()
        self.hidden_dim=hidden_dim
        self.lstm=nn.LSTMCell(input_dim,self.hidden_dim)
        self.linear1=nn.Linear(hidden_dim,linear_dim)
        self.linear2=nn.Linear(linear_dim,output_dim)
    def forward(self, input,p):
        out,_=self.lstm(input)
        out=nn.Dropout(p=p)(out)
        out=self.linear1(out)
        out=nn.Dropout(p=p)(out)
        out=self.linear2(out)
        return out

model=mylstm(30,10,200,100,0.3).double()

output=model(input_)
loss=criterion(output,target)
loss.backward()
optimizer.step()
moniter=criterion(model(input2_,0),target2)
Traceback (most recent call last):

  File "<ipython-input-286-e49b6fac918b>", line 1, in <module>
    output=model(input_)

  File "D:\Users\shan xu\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)

TypeError: forward() missing 1 required positional argument: 'p'

But neither of them worked.

MBT
  • 21,733
  • 19
  • 84
  • 102
Tommy Yu
  • 1,080
  • 3
  • 11
  • 30
  • 1
    model.eval() should work. are you sure you haven't introduced a bug or have changed the value of your input tensors ? – harveyslash Dec 21 '18 at 05:45
  • yeah, I tried to removed dropout layers, the result turned out to be constant no matter how much time I casted. So I think it is just the case that dropout is applied that I got different results. – Tommy Yu Dec 21 '18 at 05:49

3 Answers3

31

You have to define your nn.Dropout layer in your __init__ and assign it to your model to be responsive for calling eval().

So changing your model like this should work for you:

class mylstm(nn.Module):
    def __init__(self,input_dim, output_dim, hidden_dim,linear_dim,p):
        super(mylstm, self).__init__()
        self.hidden_dim=hidden_dim
        self.lstm=nn.LSTMCell(input_dim,self.hidden_dim)
        self.linear1=nn.Linear(hidden_dim,linear_dim)
        self.linear2=nn.Linear(linear_dim,output_dim)

        # define dropout layer in __init__
        self.drop_layer = nn.Dropout(p=p)
    def forward(self, input):
        out,_= self.lstm(input)

        # apply model dropout, responsive to eval()
        out= self.drop_layer(out)
        out= self.linear1(out)

        # apply model dropout, responsive to eval()
        out= self.drop_layer(out)
        out= self.linear2(out)
        return out

If you change it like this dropout will be inactive as soon as you call eval().

NOTE: If you want to continue training afterwards you need to call train() on your model to leave evaluation mode.


You can also find a small working example for dropout with eval() for evaluation mode here: nn.Dropout vs. F.dropout pyTorch

MBT
  • 21,733
  • 19
  • 84
  • 102
  • 3
    is it cool to use the same dropout layer multiple times in a model? – bgenchel Jul 25 '19 at 20:44
  • It appears that in Pytorch, you have to define all the layers as fields in the class if you want things to work well. Am I right? When I once assigned the layers into a list (because I wanted things to be dynamic), they were not included in `.model_dict()`, so I could not save the network. Solved it by also calling `setattr(self, layer_name, layer)` within the net's `__init__` function. It appears that Pytorch will not recursively look for additional components within non-pytorch components, such as lists or other data structures. – SomethingSomething Dec 03 '19 at 12:19
  • 1
    @SomethingSomething Not sure if I got you right, but you might want to take a look at: [`torch.nn.ModuleList`](https://pytorch.org/docs/stable/nn.html#torch.nn.ModuleList) – MBT Dec 03 '19 at 18:40
  • Thank you @blue-phoenox, this was very helpful. So the `ModuleList` is a list designated for containing components that will be recursively updated when calling methods such as `model.eval()`, `model.train()`, if I got it right. – SomethingSomething Dec 05 '19 at 07:31
  • 1
    @SomethingSomething Yes, using `nn.ModuleList` will make sure that all the parameters/modules in it will get **registered** properly, so they will be visible by all `Module` methods such as `train()`. – MBT Dec 05 '19 at 12:37
  • @bgenchel I seem to have missed your comment, sorry for that. Sure it is no problem to use the same layer multiple times, since the dropout layer has no parameters that will be learned. It just performs the dropout operation on the given droprate. It does this just as good when you use it multiple times. – MBT Aug 06 '20 at 16:40
2

I add this answer just because I'm facing now the same issue while trying to reproduce Deep Bayesian active learning through dropout disagreement. If you need to keep dropout active (for example to bootstrap a set of different predictions for the same test instances) you just need to leave the model in training mode, there is no need to define your own dropout layer.

Since in pytorch you need to define your own prediction function, you can just add a parameter to it like this:

def predict_class(model, test_instance, active_dropout=False):
    if active_dropout:
        model.train()
    else:
        model.eval()
MBT
  • 21,733
  • 19
  • 84
  • 102
Edoardo Guerriero
  • 1,210
  • 7
  • 16
0

As the other answers said, the dropout layer is desired to be defined in your model's __init__ method, so that your model can keep track of all information of each pre-defined layer. When the model's state is changed, it would notify all layers and do some relevant work. For instance, while calling model.eval() your model would deactivate the dropout layers but directly pass all activations. In general, if you wanna deactivate your dropout layers, you'd better define the dropout layers in __init__ method using nn.Dropout module.

two four
  • 26
  • 5