Taking the code in this repository (https://github.com/tensorflow/models/tree/master/official/resnet) as an example, I adapted it to run on another dataset with a large number of classes.
Everything seems to work fine, convergence is good, except that each time a checkpoint is restored, the loss (and the train accuracy) suddenly spikes. After some time, it retrieves its old minima and goes down.
Is something not well restored? I mean, is it related to the fact that the checkpoint file contains nothing about the optimizer (like the gradients from the previous step)?