MNIST overfitting

Question

I am currently working on the MNIST dataset. My model has overfit the training data and I want to reduce the overfitting by using weight_decay. I am currently using 0.1 as the value for weight_decay which is giving me bad results as my validation loss and training loss are not decreasing. However, I want to experiment with different values for weight_decay. So that i can plot the different amounts of weight_decay on the x-axis and the performance of validation set on the y-axis. How do i do that? store the values in a list and use a for loop to iterate through? Below is the code that i have tried until now.

class NN(nn.Module):
    def __init__(self):
        super().__init__()
        self.layers = nn.Sequential(
        nn.Flatten(),
        nn.Linear(784,4096),
        nn.ReLU(),
        nn.Linear(4096,2048),
        nn.ReLU(),
        nn.Linear(2048,1024),
        nn.ReLU(),
        nn.Linear(1024,512),
        nn.ReLU(),
        nn.Linear(512,256),
        nn.ReLU(),
        nn.Linear(256,128),
        nn.ReLU(),
        nn.Linear(128,64),
        nn.ReLU(),
        nn.Linear(64,32),
        nn.ReLU(),
        nn.Linear(32,16),
        nn.ReLU(),
        nn.Linear(16,10))

    def forward(self,x):
        return self.layers(x)


def accuracy_and_loss(model, loss_function, dataloader):
    total_correct = 0
    total_loss = 0
    total_examples = 0
    n_batches = 0
    with torch.no_grad():
        for data in testloader:
            images, labels = data
            outputs = model(images)
            batch_loss = loss_function(outputs,labels)
            n_batches += 1
            total_loss += batch_loss.item()
            _, predicted = torch.max(outputs, dim=1)
            total_examples += labels.size(0)
            total_correct += (predicted == labels).sum().item()
    accuracy = total_correct / total_examples
    mean_loss = total_loss / n_batches
    return (accuracy, mean_loss)

def define_and_train(model,dataset_training, dataset_test):
trainloader = torch.utils.data.DataLoader( small_trainset, batch_size=500, shuffle=True)
testloader = torch.utils.data.DataLoader( dataset_test, batch_size=500, shuffle=True)
values = [1e-8,1e-7,1e-6,1e-5]
model = NN()
for params in values:
    optimizer = torch.optim.Adam(model.parameters(), lr=0.001, weight_decay = params)
    train_acc = []
    val_acc = []
    train_loss = []
    val_loss = []
    for epoch in range(100):
    
        total_loss = 0
        total_correct = 0
        total_examples = 0
        n_mini_batches = 0
    
        for i,mini_batch in enumerate(trainloader,0):
        
            images,labels = mini_batch
            optimizer.zero_grad()
            outputs = model(images)
            loss = loss_function(outputs,labels)
            loss.backward()
            optimizer.step()
            n_mini_batches += 1
            total_loss += loss.item()
            _, predicted = torch.max(outputs, dim=1)
            total_examples += labels.size(0)
            total_correct += (predicted == labels).sum().item()
        
        epoch_training_accuracy = total_correct / total_examples
        epoch_training_loss = total_loss / n_mini_batches
        epoch_val_accuracy, epoch_val_loss = accuracy_and_loss( model, loss_function, testloader )

        print('Params %f Epoch %d loss: %.3f acc: %.3f val_loss: %.3f val_acc: %.3f'
              %(params, epoch+1, epoch_training_loss, epoch_training_accuracy, epoch_val_loss, epoch_val_accuracy))
    
        train_loss.append( epoch_training_loss )
        train_acc.append( epoch_training_accuracy )
        val_loss.append( epoch_val_loss )
        val_acc.append( epoch_val_accuracy )

    history = { 'train_loss': train_loss, 
                'train_acc': train_acc, 
                'val_loss': val_loss,
                'val_acc': val_acc }
    return ( history, model )

This is the plot that I am getting. Where am I going wrong?

I am trying to overfit so that i can understand regularisation using weight_decay — Prajwal, Mar 28 '22 at 14:42

score 0 · Answer 1 · answered Mar 28 '22 at 08:04

0

I cannot know any information. (Such as loss function, dataset size, dataset content (training and validation), results of 100 or 200 epochs, your scope of the question)

However, the overfitted model may classify the validation dataset. Because the MNIST dataset is not that hard with deep learning (compared to other image classifications). How about adding white noise to the validation dataset? You may get a large loss on validation.

Or if you want to use your validation dataset, train the model for more at least 1000 epochs. But, as I said above, the overfitted model may classify the validation dataset.

answered Mar 28 '22 at 08:04

nambee

160
1
5

I could do that and reduce overfitting by weight_decay as well. But i have to give values every time i run the function. I have used for params in values to loop through the values but that is not working. What is the reason for that? i have updated the code – Prajwal Mar 28 '22 at 21:15
You didn't post the result graph. – nambee Mar 29 '22 at 05:10
You still do not give enough information. I cannot know 1. The overfitting happens for every 100 epochs which is your param update period. 2. Adam is a powerful adaptive optimizer. It uses past training results. But, you re-create it every 100 epochs. You should use SGD or change the weight_decay value (not the optimizer). When you ask, try to ask more clear. Such as, [Problem or Target] [Your Try] [Question] [Reproducible code] [Summary]. – nambee Mar 29 '22 at 05:22
I have updated the plot. I can see in the plot that validation loss is decreasing which means i am reducing the overfitting. However do i need to give different values for weight_decay each time i run the function? My questions is : How do i plot the amounts of regularization(different values for weight_decay) on the x-axis and the validation loss on the y-axis to show the effect of regularization on the validation set. – Prajwal Mar 29 '22 at 13:13
Any idea how do i do that? – Prajwal Mar 29 '22 at 21:19
Training framework: https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate Optimizer param update: https://stackoverflow.com/a/48324389/13624658 – nambee Mar 30 '22 at 01:10
i want to plot the different values of weight_decay on the x axis and validation set performance on the y axis. How do i do that? I do not want to change the learning rate – Prajwal Mar 30 '22 at 20:47
Based on the second link, Try this code below. just copy and past it into a python interpreter. The purpose of the first link is that you need to check the training framework.---------- import torch; mm = [torch.nn.Parameter(torch.randn(2, 2, requires_grad=True))]; oo=torch.optim.Adam(mm, 0.1,weight_decay=0.01); print(oo.param_groups[0]['weight_decay']); oo.param_groups[0]['weight_decay']=0.1; print(oo.param_groups[0]['weight_decay']) – nambee Mar 31 '22 at 03:09

MNIST overfitting

1 Answers1