0

I am working on a model trained on the MNIST dataset. I am using the torch.optim.adam model and have been experimenting with tuning the hyper parameters. After running a lot of tests, I have come to find a combination of hyper parameters that give 90% accuracy. However, I feel like maybe since I am new to this, there might be a more efficient way to find the optimal values of the hyperparameters. The brute force approach seems to depend on trial and error & I was wondering if there is certain strategy to find these values. Example of the code being used is:

if __name__ == '__main__':
    end = time.time()
    model_ft = Net().to(device) 
    print(model_ft.network)
    criterion = nn.CrossEntropyLoss() 

    optimizer_ft = optim.Adam(model_ft.parameters(), lr=1e-3)

    exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=9, gamma=0.5) 
    
    history, accuracy = train_test(model_ft, criterion, optimizer_ft, exp_lr_scheduler,
               num_epochs=15) 
  

Here I would like to find the optimal values of:-

  1. Learning Rate
  2. Step Size
  3. Gamma
  4. Number of Epochs
    Any help is much appreciated!
AloneTogether
  • 25,814
  • 5
  • 20
  • 39
JANVI SHARMA
  • 115
  • 1
  • 11

1 Answers1

1

A similar question was already answered in-depth it seems.

However, in short, you can use something called Grid Search. With Grid Search, you set the values you want to try for each hyperparameter, and then Grid Search will try every combination. This link shows how to do it with PyTorch

The following Medium Post goes more in-depth about other methods and packages to try, but I think you should start with a simple grid search.

J.vR
  • 21
  • 4
  • 1
    Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Dec 13 '21 at 13:41
  • Thanks for your answer. For grid search, from what I understand, you will have to train your model on certain possible values of hyper params first right? and then run grid search to find the best model? is there a way by which we don't input the possible/guessed hyper param values & the function converges to the optimal values? – JANVI SHARMA Dec 15 '21 at 04:43