Resume training with Adam optimizer in Keras

Question

My question is quite straightforward but I can't find a definite answer online (so far).

I have saved the weights of a keras model trained with an adam optimizer after a defined number of epochs of training using:

callback = tf.keras.callbacks.ModelCheckpoint(filepath=path, save_weights_only=True)
model.fit(X,y,callbacks=[callback])

When I resume training after closing my jupyter, can I simply use:

model.load_weights(path)

to continue training.

Since Adam is dependent on the epoch number (such as in the case of learning rate decay), I would like to know the easiest way to resume training in the same conditions as before.

Following ibarrond's answer, I have written a small custom callback.

optim = tf.keras.optimizers.Adam()
model.compile(optimizer=optim, loss='categorical_crossentropy',metrics=['accuracy'])

weight_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path, save_weights_only=True, verbose=1, save_best_only=False)

class optim_callback(tf.keras.callbacks.Callback):
    '''Custom callback to save optimiser state'''

          def on_epoch_end(self,epoch,logs=None):
                optim_state = tf.keras.optimizers.Adam.get_config(optim)
                with open(optim_state_pkl,'wb') as f_out:                  
                       pickle.dump(optim_state,f_out)

model.fit(X,y,callbacks=[weight_callback,optim_callback()])

When I resume training:

model.load_weights(checkpoint_path)
with open(optim_state_pkl,'rb') as f_out:                  
                    optim_state = pickle.load(f_out)
tf.keras.optimizers.Adam.from_config(optim_state)

I would just like to check if this is correct. Many thanks again!!

Addendum: On further reading of the default Keras implementation of Adam and the original Adam paper, I believe that the default Adam is not dependent on epoch number but only on the iteration number. Therefore, this is unnecessary. However, the code may still be useful for anyone who wishes to keep track of other optimisers.

Why is this unnecessary with Adam not being dependent on epoch number? Aren't epoch number and iteration number related enough so that you still want to keep track of them when resuming training? `Epoch` = `iteration` * `batch_size` ,and since `batch_size` is (most of the times) constant, My guess is that they are both equally important. — ibarrond, Feb 06 '20 at 08:56
@ibarrond The reason why i became convinced that Adam doesn't change over epochs is because printing out the dict from get_config shows the original default configuration of Adam in Keras. Hence I thought it is staying constant. However, you do make me think maybe there's more to it. It doesn't really make sense to only change the lr within each epoch and restart back to default lr in the next epoch. But I can't find any documentation in keras on how iteration is tracked between epochs. If it is, it doesn't seem like get_config is giving me all the information about the state of the optimiser. — sunnydk, Feb 06 '20 at 18:00

ibarrond · Accepted Answer · 2020-02-05T14:03:12.910

5

In order to perfectly capture the status of your optimizer, you should store its configuration using the function get_config(). This function returns a dictionary (containing the options) that can be serialized and stored in a file using pickle.

To restart the process, just d = pickle.load('my_saved_tfconf.txt') to retrieve the dictionary with the configuration and then generate your Adam Optimizer using the function from_config(d) of the Keras Adam Optimizer.

edited Feb 05 '20 at 14:03

answered Feb 05 '20 at 13:41

ibarrond

6,617
4
26
45

Many thanks for this tip, sounds like a more sound approach :) – sunnydk Feb 05 '20 at 13:46
I have added small custom code, and would like to know if it is correctly implemented if possible, apologies for the trouble! – sunnydk Feb 05 '20 at 15:34
as far, as I can see, get_config only returns user config of the model. it allows you to run the same model with fresh settings, but does not allow to actually resume optimization process, because it does not contain optimizer's internal state – Arsen Zahray Feb 04 '23 at 18:41
To get the internal state of the optimizer you can retrieve its [`variables`](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Adam#variables), serialize their values, and later on rewrite these values into the new optimizer instance with [`set_weights`](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Adam#set_weights), being careful to put them back in the same order. – ibarrond Feb 06 '23 at 09:06

Resume training with Adam optimizer in Keras

1 Answers1

Linked