Save and load model optimizer state

Question

I have a set of fairly complicated models that I am training and I am looking for a way to save and load the model optimizer states. The "trainer models" consist of different combinations of several other "weight models", of which some have shared weights, some have frozen weights depending on the trainer, etc. It is a bit too complicated of an example to share, but in short, I am not able to use model.save('model_file.h5') and keras.models.load_model('model_file.h5') when stopping and starting my training.

Using model.load_weights('weight_file.h5') works fine for testing my model if the training has finished, but if I attempt to continue training the model using this method, the loss does not come even close to returning to its last location. I have read that this is because the optimizer state is not saved using this method which makes sense. However, I need a method for saving and loading the states of the optimizers of my trainer models. It seems as though keras once had a model.optimizer.get_sate() and model.optimizer.set_sate() that would accomplish what I am after, but that does not seem to be the case anymore (at least for the Adam optimizer). Are there any other solutions with the current Keras?

Will obtaining the states using ```model.optimizer.get_config()```, saving this dictionary, and then setting each of these values to the trainer model optimizers before retraining accomplish this? — Starnetter, Mar 27 '18 at 03:30
Not likely. `get_config()` only gets properties like `lr`, `decay`, etc. The internal weights would not be returned by it. — Yu-Yang, Mar 27 '18 at 04:31
I can't see `get_sate()` on keras.__version__ 2.1.6 and also in master https://github.com/keras-team/keras/blob/613aeff37a721450d94906df1a3f3cc51e2299d4/keras/optimizers.py#L60 Looks like they were removed https://github.com/keras-team/keras/pull/437 — mrgloom, Jun 28 '19 at 11:10
As of tensorflow 2.5, if you set the optimizer of a keras model with `model.compile`, then `model.save_weights` and `model.load_weights` seem to preserve the optimizer state with no problem. — Yibo Yang, Aug 14 '21 at 18:53

score 38 · Accepted Answer · answered Mar 27 '18 at 04:29

You can extract the important lines from the load_model and save_model functions.

For saving optimizer states, in save_model:

# Save optimizer weights.
symbolic_weights = getattr(model.optimizer, 'weights')
if symbolic_weights:
    optimizer_weights_group = f.create_group('optimizer_weights')
    weight_values = K.batch_get_value(symbolic_weights)

For loading optimizer states, in load_model:

# Set optimizer weights.
if 'optimizer_weights' in f:
    # Build train function (to get weight updates).
    if isinstance(model, Sequential):
        model.model._make_train_function()
    else:
        model._make_train_function()

    # ...

    try:
        model.optimizer.set_weights(optimizer_weight_values)

Combining the lines above, here's an example:

First fit the model for 5 epochs.

X, y = np.random.rand(100, 50), np.random.randint(2, size=100)
x = Input((50,))
out = Dense(1, activation='sigmoid')(x)
model = Model(x, out)
model.compile(optimizer='adam', loss='binary_crossentropy')
model.fit(X, y, epochs=5)

Epoch 1/5
100/100 [==============================] - 0s 4ms/step - loss: 0.7716
Epoch 2/5
100/100 [==============================] - 0s 64us/step - loss: 0.7678
Epoch 3/5
100/100 [==============================] - 0s 82us/step - loss: 0.7665
Epoch 4/5
100/100 [==============================] - 0s 56us/step - loss: 0.7647
Epoch 5/5
100/100 [==============================] - 0s 76us/step - loss: 0.7638

Now save the weights and optimizer states.

model.save_weights('weights.h5')
symbolic_weights = getattr(model.optimizer, 'weights')
weight_values = K.batch_get_value(symbolic_weights)
with open('optimizer.pkl', 'wb') as f:
    pickle.dump(weight_values, f)

Rebuild the model in another python session, and load weights.

x = Input((50,))
out = Dense(1, activation='sigmoid')(x)
model = Model(x, out)
model.compile(optimizer='adam', loss='binary_crossentropy')

model.load_weights('weights.h5')
model._make_train_function()
with open('optimizer.pkl', 'rb') as f:
    weight_values = pickle.load(f)
model.optimizer.set_weights(weight_values)

Continue model training.

model.fit(X, y, epochs=5)

Epoch 1/5
100/100 [==============================] - 0s 674us/step - loss: 0.7629
Epoch 2/5
100/100 [==============================] - 0s 49us/step - loss: 0.7617
Epoch 3/5
100/100 [==============================] - 0s 49us/step - loss: 0.7611
Epoch 4/5
100/100 [==============================] - 0s 55us/step - loss: 0.7601
Epoch 5/5
100/100 [==============================] - 0s 49us/step - loss: 0.7594

I believe this appears to be working, at least the loss is not blowing up as it was before. Now it seems to start a bit higher than where it left off an descend back down a bit faster. Thanks @Yu-Yang. I ended up using the save_model and load_model functions and just removed the saving and loading of weights — Starnetter, Mar 28 '18 at 22:44
What is here model._make_train_function()? Because I get as an error: "AttributeError: 'Model' object has no attribute '_make_train_function'" — Ciccios_1518, Jan 20 '21 at 09:57
@Yu-Yang - following up on @DvD_95's comment. I think `_make_train_function` no longer exists (at least in TF2.3). That said there is `model.make_train_function()` (without the underscore). But when I use this on an Adam Optimizer I get: **ValueError: You called set_weights(weights) on optimizer Adam with a weight list of length 255, but the optimizer was expecting 0 weights.** I checked the src code and it does seem like `set_weights` should work. Any thoughts on this? — brook, Feb 11 '21 at 17:40
@brook have you solved this issue? I have the same problem as you have. — Xyz, May 04 '21 at 00:48
TF2 more and more becoming CONFUSING themselves, buggy, terrible documentation . I will switch to Pytorch soon ! This is wasting time and energy. Why would they have to make thing complicated while Keras was so beautifully simple ? — Thư Sinh, Sep 30 '21 at 07:16

Alex Trevithick · Answer 2 · 2020-07-25T14:48:24.403

For those who are not using model.compile and instead performing automatic differentiation to apply the gradients manually with optimizer.apply_gradients, I think I have a solution.

First, save the optimizer weights: np.save(path, optimizer.get_weights())

Then, when you are ready to reload the optimizer, show the newly instantiated optimizer the size of the weights it will update by calling optimizer.apply_gradients on a list of tensors of the size of the variables for which you calculate gradients. It is extremely important to then set the weights of the model AFTER you set the weights of the optimizer because momentum-based optimizers like Adam will update the weights of the model even if we give it gradients which are zero.

import tensorflow as tf
import numpy as np

model = # instantiate model (functional or subclass of tf.keras.Model)

# Get saved weights
opt_weights = np.load('/path/to/saved/opt/weights.npy', allow_pickle=True)

grad_vars = model.trainable_weights
# This need not be model.trainable_weights; it must be a correctly-ordered list of 
# grad_vars corresponding to how you usually call the optimizer.

optimizer = tf.keras.optimizers.Adam(lrate)

zero_grads = [tf.zeros_like(w) for w in grad_vars]

# Apply gradients which don't do nothing with Adam
optimizer.apply_gradients(zip(zero_grads, grad_vars))

# Set the weights of the optimizer
optimizer.set_weights(opt_weights)

# NOW set the trainable weights of the model
model_weights = np.load('/path/to/saved/model/weights.npy', allow_pickle=True)
model.set_weights(model_weights)

Note that if we try to set the weights before calling apply_gradients for the first time, an error is thrown that the optimizer expects a weight list of length zero.

This was helpful and saved me many hours of re-training, thanks! — karthik_ghorpade, Nov 01 '20 at 10:46
Yes, it should work for any optimizer, but it only makes sense to use it for optimizers who have weights which depend on the size of the variables being calculated — Alex Trevithick, Nov 12 '20 at 23:12
I btw found a solution to avoid `apply_gradients` and `zero_grads` calculation. The solution is to apply the `optimizer._create_all_weights(model.trainable_variables)` inside `with tf.name_scope(optimizer._name):` and `with tf.init_scope():`. The solution can be found in the source code of the `apply_gradients()` method. See [source](https://github.com/tensorflow/tensorflow/blob/v2.3.1/tensorflow/python/keras/optimizer_v2/optimizer_v2.py#L735-L771) at line 516-519. — thijsvdp, Nov 12 '20 at 23:36
BEWARE: this does NOT work with TF2 multi GPU 2.4.1 !!! Any idea please ? — Thư Sinh, Sep 30 '21 at 07:14
optimizer.get_weights() is no longer accessible in version 2.11 — AlexP, Feb 10 '23 at 01:14

Ramiro R.C. · Answer 3 · 2020-11-04T14:40:15.193

Completing Alex Trevithick answer, it is possible to avoid re calling model.set_weights, simply by saving the state of the variables before applying the gradient and then reloading. This can useful when loading a model from an h5 file, and looks cleaner (imo).

The saving/loading functions are the following (thanks Alex again):

def save_optimizer_state(optimizer, save_path, save_name):
    '''
    Save keras.optimizers object state.

    Arguments:
    optimizer --- Optimizer object.
    save_path --- Path to save location.
    save_name --- Name of the .npy file to be created.

    '''

    # Create folder if it does not exists
    if not os.path.exists(save_path):
        os.makedirs(save_path)
    
    # save weights
    np.save(os.path.join(save_path, save_name), optimizer.get_weights())

    return

def load_optimizer_state(optimizer, load_path, load_name, model_train_vars):
    '''
    Loads keras.optimizers object state.

    Arguments:
    optimizer --- Optimizer object to be loaded.
    load_path --- Path to save location.
    load_name --- Name of the .npy file to be read.
    model_train_vars --- List of model variables (obtained using Model.trainable_variables)

    '''

    # Load optimizer weights
    opt_weights = np.load(os.path.join(load_path, load_name)+'.npy', allow_pickle=True)

    # dummy zero gradients
    zero_grads = [tf.zeros_like(w) for w in model_train_vars]
    # save current state of variables
    saved_vars = [tf.identity(w) for w in model_train_vars]

    # Apply gradients which don't do nothing with Adam
    optimizer.apply_gradients(zip(zero_grads, model_train_vars))

    # Reload variables
    [x.assign(y) for x,y in zip(model_train_vars, saved_vars)]

    # Set the weights of the optimizer
    optimizer.set_weights(opt_weights)


    return

score 3 · Answer 4 · answered Feb 10 '23 at 01:13

From version 2.11 optimizer.get_weights() is no longer accessible. You can eventually switch to tf.optimizers.legacy classes but it is not recommended.

Instead, The class tf.train.Checkpoint is specially designed for saving both model and optimizer weights:

checkpoint = tf.train.Checkpoint(model=model,optim=optim)
checkpoint.save(path='saved_model/ckpt-1')
...
checkpoint.restore(path='saved_model/ckpt-1')

Finally, then class tf.train.CheckpointManager manages multiple checkpoint versions and make it very easy:

checkpoint = tf.train.Checkpoint(model=model,optim=optim)
checkpoint_manager = tf.train.CheckpointManager(checkpoint, 'saved_model', max_to_keep = 5)
checkpoint_manager.restore_or_initialize()
...
checkpoint_manager.save()

score 2 · Answer 5 · answered Oct 07 '18 at 18:48

2

upgrading Keras to 2.2.4 and using pickle solved this issue for me. with keras release 2.2.3 Keras models can now be safely pickled.

answered Oct 07 '18 at 18:48

ismail

368
4
6

Zaccharie Ramzi · Answer 6 · 2021-03-08T11:15:04.423

Anyone trying to use @Yu-Yang's solution in a distributed setting might run in the following error:


ValueError: Trying to create optimizer slot variable under the scope for tf.distribute.Strategy (<tensorflow.python.distribute.distribute_lib._DefaultDistributionStrategy object at 0x7fdf357726d8>), which is different from the scope used for the original variable (MirroredVariable:{
  0: <tf.Variable 'conv2d_1/kernel:0' shape=(1, 1, 1, 1) dtype=float32, numpy=array([[[[-0.9592359]]]], dtype=float32)>
}). Make sure the slot variables are created under the same strategy scope. This may happen if you're restoring from a checkpoint outside the scope

or similar.

To solve this problem, you simply need to run the model's optimizer weights setting on each replica using the following:

import tensorflow as tf

strat = tf.distribute.MirroredStrategy()

with strat.scope():
    model = tf.keras.models.Sequential([tf.keras.layers.Conv2D(1, 1, padding='same')])
    model.compile(optimizer='adam', loss='mse')
    model(tf.random.normal([1, 16, 16, 1]))

    model.load_weights('model_weights.hdf5')

def model_weight_setting():
    grad_vars = model.trainable_weights
    zero_grads = [tf.zeros_like(w) for w in grad_vars]
    model.optimizer.apply_gradients(zip(zero_grads, grad_vars))
    with open('optimizer.pkl', 'rb') as f:
        weight_values = pickle.load(f)
    model.optimizer.set_weights(weight_values)

strat.run(model_weight_setting)

For some reason, this isn't needed for setting the model weights, but make sure that you create (via the call here) and load the weights of the model within the strategy scope or you might get an error along the lines of ValueError: Trying to create optimizer slot variable under the scope for tf.distribute.Strategy (<tensorflow.python.distribute.collective_all_reduce_strategy.CollectiveAllReduceStrategy object at 0x14ffdce82c50>), which is different from the scope used for the original variable.

If you want the full-on example, I created a colab showcasing this solution.

score 0 · Answer 7 · answered Jul 22 '21 at 20:00

The code below works for me (Tensorflow 2.5).
I'm using the universal sentence encoder as model, together with an Adam optimizer.

Basically what I do is: I make use of a dummy input which sets the optimizer correctly.
Afterwards I set the weights.

Save the weights of the optimizer

np.save(f'{path}/optimizer.npy', optimizer.get_weights())

load the optimizer

# Load an optimizer
optimizer = tf.keras.optimizers.Adam()

# Load the optimizer weights
opt_weights = np.load(f'{path}/optimizer.npy', allow_pickle=True)

# Train a dummy record
# I'm using the universal sentence encoder which requires a string as input
with tf.GradientTape() as tape:
    # preduct a dummy record
    tmp = model('')
    # create a dummy loss
    loss = tf.reduce_mean((tmp - tmp)**2)

# calculate the gradiens and add the gradients
# the gradients should be near 0
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))

# set the weights
optimizer.set_weights(opt_weights)

Save and load model optimizer state

7 Answers7

Linked