Consider that you are using a Pytorch optimizer such as torch.optim.Adam(model_parameters).
So in your training loop you will have something like:
optimizer = torch.optim.Adam(model_parameters)
# put the training loop here
loss.backward()
optimizer.step()
optimizer.zero()
Is there a way to monitor what steps are taking your optimizer ? To make sure that you are not on a flat area and thus taking no steps since the gradient are null. Maybe checking the learning rate would be a solution ?