I am looking at implementing a hyper-parameter tuning method for a feed-forward neural network (FNN) implemented using PyTorch
. My original FNN , the model is named net
, has been implemented using a mini-batch learning approach with epochs:
#Parameters
batch_size = 50 #larger batch size leads to over fitting
num_epochs = 1000
learning_rate = 0.01 #was .01-AKA step size - The amount that the weights are updated during training
batch_no = len(x_train) // batch_size
criterion = nn.CrossEntropyLoss() #performance of a classification model whose output is a probability value between 0 and 1
optimizer = torch.optim.Adam(net.parameters(), lr=learning_rate)
for epoch in range(num_epochs):
if epoch % 20 == 0:
print('Epoch {}'.format(epoch+1))
x_train, y_train = shuffle(x_train, y_train)
# Mini batch learning - mini batch since batch size < n(batch gradient descent), but > 1 (stochastic gradient descent)
for i in range(batch_no):
start = i * batch_size
end = start + batch_size
x_var = Variable(torch.FloatTensor(x_train[start:end]))
y_var = Variable(torch.LongTensor(y_train[start:end]))
# Forward + Backward + Optimize
optimizer.zero_grad()
ypred_var = net(x_var)
loss =criterion(ypred_var, y_var)
loss.backward()
optimizer.step()
I lastly test my model on a separate test set.
I came across an approach using randomised search to tune the hyper-parameters as well as implementing K-fold cross-validation (RandomizedSearchCV
).
My question is two-fold(no pun intended!) and firstly is theoretical: Is k-fold validation is necessary or could add any benefit to mini-batch feed-forward neural network? From what I can see, the mini-batch approach should do roughly the same job, stopping over-fitting.
I also found a good answer here but I'm not sure this addresses a mini-batch approach approach specifically.
Secondly, if k-fold is not necessary, is there another hyper-parameter tuning function for PyTorch
to avoid manually creating one?