In Pytorch quickstart tutorial the code uses model.eval()
during evaluation/test but it does not call model.train()
during training.
According to this and source, some modules like BatchNorm
and Dropout
need to know if the model is in train or evaluation mode. The model in the tutorial does not use any such module so it runs to convergence. Am I missing something or Pytorch's very first tutorial actually has a logical bug?
Training:
def train(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset)
for batch, (X, y) in enumerate(dataloader):
X, y = X.to(device), y.to(device)
# Compute prediction error
pred = model(X)
loss = loss_fn(pred, y)
# Backpropagation
optimizer.zero_grad()
loss.backward()
optimizer.step()
if batch % 100 == 0:
loss, current = loss.item(), batch * len(X)
print(f"loss: {loss:>7f} [{current:>5d}/{size:>5d}]")
You can see there is no model.train()
in the above code.
Testing:
def test(dataloader, model):
size = len(dataloader.dataset)
model.eval()
test_loss, correct = 0, 0
with torch.no_grad():
for X, y in dataloader:
X, y = X.to(device), y.to(device)
pred = model(X)
test_loss += loss_fn(pred, y).item()
correct += (pred.argmax(1) == y).type(torch.float).sum().item()
test_loss /= size
correct /= size
print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")
At the second line, there is a model.eval()
.
Training loop:
epochs = 5
for t in range(epochs):
print(f"Epoch {t+1}\n-------------------------------")
train(train_dataloader, model, loss_fn, optimizer)
test(test_dataloader, model)
print("Done!")
This loop calls train()
and test()
methods without any call to model.train()
. So after the first call of test()
, the model is always in "evaluation" mode. If we add a BatchNorm
to the model we'll be on our way to encounter a hard-to-find bug.
Main question:
Is it good practice to always call model.train()
during training and model.eval()
during evaluation/test?