1

I load model via keras.models.load_model that was saved via model.save

Then I'm trying to print optimizer state:

from keras import backend as K
tf_session = K.get_session()
print(model.optimizer.iterations.eval(session=tf_session))
print(model.optimizer.lr.eval(session=tf_session))

Which prints:

<tf.Variable 'Adadelta/iterations:0' shape=() dtype=int64_ref>
<tf.Variable 'Adadelta/lr:0' shape=() dtype=float32_ref>
0
1.0

Or onother way to obtain optimizer parameters:

print(model.optimizer.get_config())
{'lr': 1.0, 'rho': 0.95, 'decay': 0.0, 'epsilon': 1e-07}

So my question is does keras reset optimizer state on model load?

According to this https://github.com/keras-team/keras/blob/master/keras/engine/saving.py#L473 it should save model's optimizer's state.

And here is an actual code that save optimizer state: https://github.com/keras-team/keras/blob/613aeff37a721450d94906df1a3f3cc51e2299d4/keras/engine/saving.py#L132

Optimizer config: https://github.com/keras-team/keras/blob/613aeff37a721450d94906df1a3f3cc51e2299d4/keras/engine/saving.py#L146

Optimizer weights: https://github.com/keras-team/keras/blob/613aeff37a721450d94906df1a3f3cc51e2299d4/keras/engine/saving.py#L157

UPDATE:

What does model.optimizer.weights contain?

keras.__version__ 2.1.6

print('len(model.get_weights())', len(model.get_weights()))
w1 = model.get_weights()[0]
print('type(w1)', type(w1))
print('w1.shape', w1.shape)

len(model.get_weights()) 86
type(w1) <class 'numpy.ndarray'>
w1.shape (3, 3, 3, 16)

print('len(model.optimizer.get_weights())', len(model.optimizer.get_weights()))
w2 = model.optimizer.get_weights()[0]
print('type(w2)', type(w2))
print('w2.shape', w2.shape)

len(model.optimizer.get_weights()) 116
type(w2) <class 'numpy.ndarray'>
w2.shape (3, 3, 3, 16)

print('max abs diff w1-w2', np.max(np.abs(w1-w2)))
max abs diff w1-w2 0.8932746
Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
mrgloom
  • 20,061
  • 36
  • 171
  • 301

1 Answers1

2

It should save the state. The states are not reset when loading.

The correct way to check this is using the model.optimizer.weights list:

model = load_model(....)
loaded_optimizer_states = [K.eval(w) for w in model.optimizer.weights]

#resetting the optimizer
model.compile(optimizer='adadelta', ...)
reset_optimizer_states = [K.eval(w) for w in model.optimizer.weights]

for w1,w2 in zip(loaded_optimizer_states,reset_optimizer_states):
    print('equal?', (w1==w2).all())

Now, it doesn't necessarily save everything we want. The lr, for instance is not usually a weight, but just a config. There will be an internal calculation for the actual lr using the iterations value.

But you can also see in the source code, in the get_updates method of the optimizers, that:

  • SGD saves the iterations as a weight: self.weights = [self.iterations] + moments
  • But Adadelta doesn't: self.weights = accumulators + delta_accumulators

So, although the weights should be saved, you're looking at the wrong variables, and Adadelta seems to have a buggy code. If you use decay with Adadelta, you should probably manually save and load iterations or create a custom copy of the optimizers code where you add iterations to the weights changing the line above with:

self.weights = [self.iterations] + accumulators + delta_accumulators

Looking at the codes, it seems SGD is the only one that actually saves iterations, which seems to be a general bug in saving/loading optimizer states.

Opened this issue: https://github.com/keras-team/keras/issues/13027

What are model.optimizer.weights?

They are two different things:

  • model.weights: the weights of the model, they make the model work correctly even if you don't have an optimizer (it's possible to use models without compiling if you just want to make predictions)
  • model.optimizer.weights: the state of the optimizer. They are not necessarily related to the model's weights, they just define how the optimizer should "update" the model's weights when training.

Now, what is each of the weights in the list?

That depends a lot on which optimizer you are using. You can see the source code to understand what each optimizer saves as states.

The SGD optimizer has self.weights = [self.iterations] + moments. This means that the state of the SGD saves the current iteration (which is used to define the current lr when there is a decay ) and the moments of the optimizer.

The moments are a list containing tensors with the same shapes as the list of model.get_weights(). Because there is a momentum for each of the models weights.

Other optimizers use more complex mathematical calculations and can have more things as optimizer weights. The Adadelta for instance, has accumulators and delta_accumulators. I don't know what they are, we should study the mathematical formulation of this optimizer. But it's something in the same line of the SGD: optimizer states that will define how the model's weights are updated during training. They probably have also the same shapes as the model's weights, but twice.

Daniel Möller
  • 84,878
  • 18
  • 192
  • 214
  • What are `self.weights` (`model.optimizer.weights`) actually, i.e. what they contain and how it's related to `model.weights`? see my update. – mrgloom Jun 28 '19 at 12:39
  • See answer. These vars aren't really meant for regular users to understand. Only if you want to do very tricky things or custom optimizers you should understand them fully. – Daniel Möller Jun 28 '19 at 12:56
  • Seems here is a version of manually saving and loading optimizer state using `model.optimizer.set_weights` https://stackoverflow.com/a/49504376/1179925 – mrgloom Jun 28 '19 at 13:08
  • The state is saved and loaded automatically. There is only the problem with `iterations`. You should save and load `iterations` manually, the rest is ok. Use `K.set_value(model.optimizer.iterations, loaded_value)` after loading. **Do not compile a model after loading, this will reset the optimizer**. – Daniel Möller Jun 28 '19 at 13:10