3

In Pytorch quickstart tutorial the code uses model.eval() during evaluation/test but it does not call model.train() during training.

According to this and source, some modules like BatchNorm and Dropout need to know if the model is in train or evaluation mode. The model in the tutorial does not use any such module so it runs to convergence. Am I missing something or Pytorch's very first tutorial actually has a logical bug?

Training:

def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)
        
        # Compute prediction error
        pred = model(X)
        loss = loss_fn(pred, y)
        
        # Backpropagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if batch % 100 == 0:
            loss, current = loss.item(), batch * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

You can see there is no model.train() in the above code.

Testing:

def test(dataloader, model):
    size = len(dataloader.dataset)
    model.eval()
    test_loss, correct = 0, 0
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    test_loss /= size
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

At the second line, there is a model.eval().

Training loop:

epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train(train_dataloader, model, loss_fn, optimizer)
    test(test_dataloader, model)
print("Done!")

This loop calls train() and test() methods without any call to model.train(). So after the first call of test(), the model is always in "evaluation" mode. If we add a BatchNorm to the model we'll be on our way to encounter a hard-to-find bug.

Main question:

Is it good practice to always call model.train() during training and model.eval() during evaluation/test?

morteza khosravi
  • 1,599
  • 1
  • 20
  • 36
  • 3
    I don't think it's a logical bug per se, but it is something that probably should be included for reference. As you said, however, this model will run perfectly fine in training and evaluation as there is no training-dependent states in the model functions. – jhso May 10 '21 at 02:04
  • 1
    @jhso I agree. They should have used both or none. – morteza khosravi May 10 '21 at 04:13

1 Answers1

1

As the description of the tutorial says, the QuickStart is intended "to quickly familiarize yourself with PyTorch’s API" and not to understand all concepts.

I think the authors wanted to keep the quickstart as short as possible. If you seriously want to learn PyTorch, you will probably do the complete tutorial. At the latest then your question will be answered in "OPTIMIZING MODEL PARAMETERS" (part of the official tutorial)

def train_loop(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    # Set the model to training mode - important for batch normalization and dropout layers
    # Unnecessary in this situation but added for best practices
    model.train()
    ...

But you are right, a short hint ("you will later learn more about this") would be good