TensorFlow Eager Mode: How to restore a model from a checkpoint?

Question

I've trained a CNN model in TensorFlow eager mode. Now I'm trying to restore the trained model from a checkpoint file but haven't got any success.

All the examples (as shown below) I've found are talking about restoring checkpoint to a Session. But what I need is to restore the model into eager mode, i.e. without creating a session.

with tf.Session() as sess:
  # Restore variables from disk.
  saver.restore(sess, "/tmp/model.ckpt")

Basically what I need is something like:

tfe.enable_eager_execution()
model = tfe.restore('model.ckpt')
model.predict(...)

and then I can use the model to make predictions.

Can someone please help?

Update

The example code can be found at: mnist eager mode demo

I've tried to follow the steps from @Jay Shah 's answer and it almost worked but the restored model doesn't have any variables in it.

tfe.save_network_checkpoint(model,'./test/my_model.ckpt')

Out[58]:
'./test/my_model.ckpt-1720'

model2 = MNISTModel()
tfe.restore_network_checkpoint(model2,'./test/my_model.ckpt-1720')
model2.variables

Out[72]:
[]

The original model has lots of variables in it.:

model.variables

[<tf.Variable 'mnist_model_1/conv2d/kernel:0' shape=(5, 5, 1, 32) dtype=float32, numpy=
 array([[[[ -8.25184360e-02,   6.77833706e-03,   6.97569922e-02,...

why are the checkpoint names different ??? The saving checkpoint path is different than the one you are restoring... the output seems strange to me — Jai, Dec 29 '17 at 04:58
If I use the same checkpoint name, it won't work. 'my_model.ckpt-1720' is the name returned by the save_network_checkpoint function. As per Documentation, this should be the name used for restoring model. — Allen Qin, Dec 29 '17 at 05:17
ohh.. yeah the returned value has to go... was just making sure that you are doing it the right way — Jai, Dec 29 '17 at 05:19
hey @Allen can you try this for saving the model `tf.contrib.eager.Saver([variable_list]).save(chkpt_file)` and it will return a string so while restoring use that string as follows : `tf.contrib.eager.Saver.restore(returned_string)` this thing mimics `tf.train.Saver` but no session is needed for this... but the eager mode has to be enabled... while making the object of saver you have to give the variable list that you want to store .... you can get the variable list from my answer... the variables has to be `tfe.Variable` — Jai, Dec 29 '17 at 05:29

score 7 · Answer 1 · answered Dec 22 '17 at 06:26

7

Eager Execution is still a new feature in TensorFlow, and was not included in the latest version, so not all features, are supported, but fortunately, loading a model from a saved checkpoint is.

You'll need to use the tfe.Saver class (which is a thin wrapper over the tf.train.Saver class), and your code should look something like this:

saver = tfe.Saver([x, y])
saver.restore('/tmp/ckpt')

Where [x,y] represents the list of variables and/or models you wish to restore. This should precisely match the variables passed when the saver that created the checkpoint was initially created.

More details, including sample code, can be found here, and the API details of the saver can be found here.

answered Dec 22 '17 at 06:26

mr_snuffles

312
2
3

1

Specifically [tfe.restore_variables_on_create](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/eager/python/g3doc/guide.md#tfenetwork) is very useful if your variables haven't been created yet when you want to restore. This is also used in the [eager mnist example](https://github.com/tensorflow/tensorflow/blob/97a4c226e8a9e7c5c36fc38e2b9f8459c77abd5a/tensorflow/contrib/eager/python/examples/mnist/mnist.py#L201). – Allen Lavoie Dec 22 '17 at 19:24
1

Thanks @mr_snuffles for your answer. You showed how to restore variables. Can you please explain how to restore the model as it's trained in the mnist eager mode tutorial https://github.com/tensorflow/tensorflow/blob/97a4c226e8a9e7c5c36fc38e2b9f8459c77abd5a/tensorflow/contrib/eager/python/examples/mnist/mnist.py#L201 ? – Allen Qin Dec 26 '17 at 20:43
@Allen You save your session using saver.save, then call on that saver to restore the model from the latest checkpoint. It should look something like: `sess = tf.Session() new_saver = tf.train.import_meta_graph('my-model.meta') new_saver.restore(sess, tf.train.latest_checkpoint('./'))` More details about saving and loading models can be found [here](https://stackoverflow.com/questions/33759623/tensorflow-how-to-save-restore-a-model) – mr_snuffles Jan 01 '18 at 00:41
So it still seems like in September 2018 that eager mode still cannot load a graph and a checkpoint. Session graph coding is still required, making the researcher use TWO different TF APIs. It's not simpler than before, guys. TF is worse than before. – Geoffrey Anderson Sep 24 '18 at 16:57

score 3 · Accepted Answer · edited Sep 27 '18 at 08:44

Ok, after spending a few hours running the code in line-by-line mode, I've figured out a way to restore a checkpoint to a new TensorFlow Eager Mode model.

Using the examples from TF Eager Mode MNIST

Steps:

After your model has been trained, find the latest checkpoint(or the checkpoint you want) index file from the checkpoint folder created in the training process, such as 'ckpt-25800.index'. Use only the filename 'ckpt-25800' while restoring in step 5.
Start a new python terminal and enable TensorFlow Eager mode by running:

tfe.enable_eager_execution()
Create a new instance of the MNISTMOdel:

model_new = MNISTModel()
Initialise the variables for model_new by running a dummy train process once.(This step is important. Without initialising the variables first, they can't be restored by the following step. However I can't find another way to initialise variables in Eager mode other than what I did below.)

model_new(tfe.Variable(np.zeros((1,784),dtype=np.float32)), training=True)
Restore the variables to model_new using the checkpoint identified in step 1.

tfe.Saver((model_new.variables)).restore('./tf_checkpoints/ckpt-25800')
If restore process is successful, you should see something like:

INFO:tensorflow:Restoring parameters from ./tf_checkpoints/ckpt-25800

Now the checkpoint has been successfully restored to model_new and you can use it to make predictions on new data.

This works, but running a dummy forward pass looks yucky to say the least :\ not your fault, but TF's — gokul_uf, Sep 03 '18 at 13:32
How can we use TFE API to load a model from disk, completely, and populate all its nodes, graphs, variables, hyperparams, etc? Do not use MNISTModel() code to repeat the model creation code. Do not use graph API and session. Use TFE api only. TF people still have provided no working example of such a basic need. Smartalecs everywhere just give links to the docs as if that's worth a penny. — Geoffrey Anderson, Sep 24 '18 at 17:13

score 1 · Answer 3 · answered Dec 26 '17 at 09:03

I like to share TFLearn library which is Deep learning library featuring a higher-level API for TensorFlow. With the help of this library you can easily save and restore a model.

Saving a model

model = tflearn.DNN(net) #Here 'net' is your designed network model. 
#This is a sample example for training the model
model.fit(train_x, train_y, n_epoch=10, validation_set=(test_x, test_y), batch_size=10, show_metric=True)
model.save("model_name.ckpt")

Restore a model

model = tflearn.DNN(net)
model.load("model_name.ckpt")

For more example of tflearn you can check some site like...

Thanks for your answer it's good to know it can be done using TFlearn. However, I would still like to find a way to do it in TensorFlow. — Allen Qin, Dec 26 '17 at 20:45

Jai · Answer 4 · 2017-12-28T23:47:21.520

First you save your model in a checkpoint by doing following:

saver.save(sess, './my_model.ckpt')

In above line you are saving you session in "my_model.ckpt" checkpoint

Following code restores the model

saver = tf.train.Saver()
with tf.Session() as sess:
    saver.restore(sess, './my_model.ckpt')

When you restore the session as a model then you restores your model from the ckpt

For eager mode to save :

tf.contrib.eager.save_network_checkpoint(sess,'./my_model.ckpt')

For eager mode to restore :

tf.contrib.eager.restore_network_checkpoint(sess,'./my_model.ckpt')

sess is an object of class Network. Any object of class Network can be saved and restored. A quick explanation of network objects :-

class TwoLayerNetwork(tfe.Network):
    def __init__(self, name):
        super(TwoLayerNetwork, self).__init__(name=name)
        self.layer_one = self.track_layer(tf.layers.Dense(16, input_shape=(8,)))
        self.layer_two = self.track_layer(tf.layers.Dense(1, input_shape=(16,)))
    def call(self, inputs):
        return self.layer_two(self.layer_one(inputs))

After constructing an object and calling the Network, a list of variables created by tracked Layers is available via Network.variables: python

  sess = TwoLayerNetwork(name="net")   # sess is object of Network 
  output = sess(tf.ones([1, 8]))
  print([v.name for v in sess.variables])
  ```
  =================================================================
  This example prints variable names, one kernel and one bias per
  `tf.layers.Dense` layer:

  ['net/dense/kernel:0',
   'net/dense/bias:0',
   'net/dense_1/kernel:0',
   'net/dense_1/bias:0']

  These variables can be passed to a `Saver` (`tf.train.Saver`, or
  `tf.contrib.eager.Saver` when executing eagerly) to save or restore the
  `Network` 
  =================================================================
  ```
  tfe.save_network_checkpoint(sess,'./my_model.ckpt') # saving the model
  tfe.restore_network_checkpoint(sess,'./my_model.ckpt') # restoring

Thanks @jay Shah. I want the model to be restored in Eager mode. — Allen Qin, Dec 28 '17 at 23:11
@Allen I have edited my answer according to your requirement ... check it out — Jai, Dec 28 '17 at 23:22
Thanks for the update. This has almost worked for me. I have been able to save my trained model but when I tried to restore it, the restored model doesn't have any variables in it. Please see the update in my question. — Allen Qin, Dec 29 '17 at 04:04
The requirements are: 1 - Needs eager mode not session code. 2 - It has to work. — Geoffrey Anderson, Sep 24 '18 at 17:16

score 0 · Answer 5 · answered Oct 27 '18 at 09:41

Saving variables with tfe.Saver().save() :

for epoch in range(epochs):
    train_and_optimize()
    all_variables = model.variables + optimizer.variables()

    # save the varibles 
    tfe.Saver(all_variables).save(checkpoint_prefix)

And then reload saved variables with tfe.Saver().restore() :

tfe.Saver((model.variables + optimizer.variables())).restore(checkpoint_prefix)

Then the model is loaded with the saved variables, and no need to create a new one as in @Stefan Falk 's answer.

TensorFlow Eager Mode: How to restore a model from a checkpoint?

5 Answers5

Linked