656

After you train a model in Tensorflow:

  1. How do you save the trained model?
  2. How do you later restore this saved model?
Braiam
  • 1
  • 11
  • 47
  • 78
mathetes
  • 11,766
  • 7
  • 25
  • 32
  • Were you able to restore variables used in inception model? I am also trying the exact same problem but I am unable to write set of variables that were used while training the inception model (of which I have ckpt file) – exAres Oct 11 '16 at 17:52
  • I haven't tried with the inception model. Do you have the model's network structure with its names? You have to replicate the network and then load the weights and biases (the ckpt file) as Ryan explains. Maybe something has changed since Nov'15 and there's a more straightforward approach now, I'm not sure – mathetes Oct 11 '16 at 18:22
  • Ohh okay. I have loaded other pre-trained tensorflow models previously but was looking for variable specifications of inception model. Thanks. – exAres Oct 11 '16 at 18:30
  • 1
    If you restore to continue to train, just use the Saver checkpoints. If you save the model to do reference, just the tensorflow SavedModel APIs. – HY G Dec 20 '17 at 09:28
  • Also if you are using LSTM, you will have a map from string to a list of characters, be sure to save and load that list in the same order! This is not covered by saving the model weights and model graph network and will make it seem like your model was not loaded when you change sessions or the data changes. – devssh Sep 26 '18 at 06:02

28 Answers28

271

In (and after) Tensorflow version 0.11:

Save the model:

import tensorflow as tf

#Prepare to feed input, i.e. feed_dict and placeholders
w1 = tf.placeholder("float", name="w1")
w2 = tf.placeholder("float", name="w2")
b1= tf.Variable(2.0,name="bias")
feed_dict ={w1:4,w2:8}

#Define a test operation that we will restore
w3 = tf.add(w1,w2)
w4 = tf.multiply(w3,b1,name="op_to_restore")
sess = tf.Session()
sess.run(tf.global_variables_initializer())

#Create a saver object which will save all the variables
saver = tf.train.Saver()

#Run the operation by feeding input
print sess.run(w4,feed_dict)
#Prints 24 which is sum of (w1+w2)*b1 

#Now, save the graph
saver.save(sess, 'my_test_model',global_step=1000)

Restore the model:

import tensorflow as tf

sess=tf.Session()    
#First let's load meta graph and restore weights
saver = tf.train.import_meta_graph('my_test_model-1000.meta')
saver.restore(sess,tf.train.latest_checkpoint('./'))


# Access saved Variables directly
print(sess.run('bias:0'))
# This will print 2, which is the value of bias that we saved


# Now, let's access and create placeholders variables and
# create feed-dict to feed new data

graph = tf.get_default_graph()
w1 = graph.get_tensor_by_name("w1:0")
w2 = graph.get_tensor_by_name("w2:0")
feed_dict ={w1:13.0,w2:17.0}

#Now, access the op that you want to run. 
op_to_restore = graph.get_tensor_by_name("op_to_restore:0")

print sess.run(op_to_restore,feed_dict)
#This will print 60 which is calculated 

This and some more advanced use-cases have been explained very well here.

A quick complete tutorial to save and restore Tensorflow models

desertnaut
  • 57,590
  • 26
  • 140
  • 166
sankit
  • 2,759
  • 2
  • 12
  • 11
  • 5
    +1 for this # Access saved Variables directly print(sess.run('bias:0')) # This will print 2, which is the value of bias that we saved. It helps a lot for debugging purposes to see if the model is loaded correctly. the variables can be obtained with "All_varaibles = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES" . Also, "sess.run(tf.global_variables_initializer())" has to be before restore. – LGG May 16 '17 at 00:08
  • 1
    Are you sure we have to run global_variables_initializer again? I restored my graph with global_variable_initialization, and it gives me a different output every time on the same data. So I commented out the initialization and just restored the graph, input variable and ops, and now it works fine. – Aditya Shinde Jun 04 '17 at 20:44
  • @AdityaShinde I don't get why I always get different values every time. And I did not include the variable initialization step for restoring. I'm using my own code btw. – Chaine Jun 07 '17 at 09:35
  • @AdityaShinde: you don't need init op as values are already initialized by restore function, so removed it. However, I am not sure why you did get different output by using init op. – sankit Jun 08 '17 at 06:11
  • Can also freeze the trained model and restore it. As explained here - https://blog.metaflow.fr/tensorflow-how-to-freeze-a-model-and-serve-it-with-a-python-api-d4f3596b3adc – Nandeesh Jun 30 '17 at 15:17
  • 6
    @sankit When you restore the tensors why do you add `:0` to the names? – Sahar Rabinoviz Jul 13 '17 at 00:18
  • Your example for the tensors works for me, but not the operation. For the operation I had to use `get_operation_by_name` and without the added `:0`. I.e. `op_to_restore = graph.get_operation_by_name("op_to_restore")` – Carsten Aug 25 '17 at 00:08
181

In (and after) TensorFlow version 0.11.0RC1, you can save and restore your model directly by calling tf.train.export_meta_graph and tf.train.import_meta_graph according to https://www.tensorflow.org/programmers_guide/meta_graph.

Save the model

w1 = tf.Variable(tf.truncated_normal(shape=[10]), name='w1')
w2 = tf.Variable(tf.truncated_normal(shape=[20]), name='w2')
tf.add_to_collection('vars', w1)
tf.add_to_collection('vars', w2)
saver = tf.train.Saver()
sess = tf.Session()
sess.run(tf.global_variables_initializer())
saver.save(sess, 'my-model')
# `save` method will call `export_meta_graph` implicitly.
# you will get saved graph files:my-model.meta

Restore the model

sess = tf.Session()
new_saver = tf.train.import_meta_graph('my-model.meta')
new_saver.restore(sess, tf.train.latest_checkpoint('./'))
all_vars = tf.get_collection('vars')
for v in all_vars:
    v_ = sess.run(v)
    print(v_)
nbro
  • 15,395
  • 32
  • 113
  • 196
lei du
  • 3,459
  • 3
  • 11
  • 5
  • 4
    how to load variables from the saved model? How to copy values in some other variable? – neel Dec 19 '16 at 08:58
  • 10
    I am unable to get this code working. The model does get saved but I cannot restore it. It is giving me this error. ` returned a result with an error set` – Saad Qureshi Jan 08 '17 at 09:05
  • 2
    When after restoring I access the variables like shown above, it works. But I cannot get the variables more directly using `tf.get_variable_scope().reuse_variables()` followed by `var = tf.get_variable("varname")`. This gives me the error: "ValueError: Variable varname does not exist, or was not created with tf.get_variable()." Why? Should this not be possible? – jpp1 Jan 12 '17 at 14:16
  • If I add `print sess.run([w1, w2])` in the save section of the code it correctly prints the variables. But if I add that line at the end of the restore code I get an error: `NameError: name 'w1' is not defined`. If the graph and variables are restored then what is wrong here? – Ron Cohen Jan 12 '17 at 17:41
  • @Ron Cohen I managed to get the code work using TF collection mechanism according to the url link metioned above. You can check it again. Sorry. – lei du Jan 14 '17 at 09:14
  • @Iei do you have any comments on this too? http://stackoverflow.com/questions/42885132/what-is-the-right-structure-to-save-restore-a-model-in-tensorflow-during-train – superMind Mar 20 '17 at 18:10
  • 4
    This works well for variables only, but how can you get access to a placeholder and feed values to it after restoring the graph? – kbrose Mar 29 '17 at 16:09
  • I'm getting "KeyError: u'VariableV2'" when running your code for restoring the model. Any idea why? @leidu et al. – ajfbiw.s May 04 '17 at 22:57
  • 12
    This only shows how to restore the variables. How can you restore the entire model and test it on new data without redefining the network? – Chaine Jun 06 '17 at 19:26
  • this line of code gives error: 'new_saver.restore(sess, tf.train.latest_checkpoint('./'))' – Chaine Jun 06 '17 at 20:32
  • https://stackoverflow.com/questions/48083474/finish-tensorflow-training-in-progress – Ruchir Baronia Jan 03 '18 at 18:57
  • 1
    this gives me: Parent directory of model doesn't exist, can't save – Stepan Yakovenko Aug 16 '18 at 05:27
170

Tensorflow 2 Docs

Saving Checkpoints

Adapted from the docs

# -------------------------
# -----  Toy Context  -----
# -------------------------
import tensorflow as tf


class Net(tf.keras.Model):
    """A simple linear model."""

    def __init__(self):
        super(Net, self).__init__()
        self.l1 = tf.keras.layers.Dense(5)

    def call(self, x):
        return self.l1(x)


def toy_dataset():
    inputs = tf.range(10.0)[:, None]
    labels = inputs * 5.0 + tf.range(5.0)[None, :]
    return (
        tf.data.Dataset.from_tensor_slices(dict(x=inputs, y=labels)).repeat().batch(2)
    )


def train_step(net, example, optimizer):
    """Trains `net` on `example` using `optimizer`."""
    with tf.GradientTape() as tape:
        output = net(example["x"])
        loss = tf.reduce_mean(tf.abs(output - example["y"]))
    variables = net.trainable_variables
    gradients = tape.gradient(loss, variables)
    optimizer.apply_gradients(zip(gradients, variables))
    return loss


# ----------------------------
# -----  Create Objects  -----
# ----------------------------

net = Net()
opt = tf.keras.optimizers.Adam(0.1)
dataset = toy_dataset()
iterator = iter(dataset)
ckpt = tf.train.Checkpoint(
    step=tf.Variable(1), optimizer=opt, net=net, iterator=iterator
)
manager = tf.train.CheckpointManager(ckpt, "./tf_ckpts", max_to_keep=3)

# ----------------------------
# -----  Train and Save  -----
# ----------------------------

ckpt.restore(manager.latest_checkpoint)
if manager.latest_checkpoint:
    print("Restored from {}".format(manager.latest_checkpoint))
else:
    print("Initializing from scratch.")

for _ in range(50):
    example = next(iterator)
    loss = train_step(net, example, opt)
    ckpt.step.assign_add(1)
    if int(ckpt.step) % 10 == 0:
        save_path = manager.save()
        print("Saved checkpoint for step {}: {}".format(int(ckpt.step), save_path))
        print("loss {:1.2f}".format(loss.numpy()))


# ---------------------
# -----  Restore  -----
# ---------------------

# In another script, re-initialize objects
opt = tf.keras.optimizers.Adam(0.1)
net = Net()
dataset = toy_dataset()
iterator = iter(dataset)
ckpt = tf.train.Checkpoint(
    step=tf.Variable(1), optimizer=opt, net=net, iterator=iterator
)
manager = tf.train.CheckpointManager(ckpt, "./tf_ckpts", max_to_keep=3)

# Re-use the manager code above ^

ckpt.restore(manager.latest_checkpoint)
if manager.latest_checkpoint:
    print("Restored from {}".format(manager.latest_checkpoint))
else:
    print("Initializing from scratch.")

for _ in range(50):
    example = next(iterator)
    # Continue training or evaluate etc.

More links

Checkpoints capture the exact value of all parameters (tf.Variable objects) used by a model. Checkpoints do not contain any description of the computation defined by the model and thus are typically only useful when source code that will use the saved parameter values is available.

The SavedModel format on the other hand includes a serialized description of the computation defined by the model in addition to the parameter values (checkpoint). Models in this format are independent of the source code that created the model. They are thus suitable for deployment via TensorFlow Serving, TensorFlow Lite, TensorFlow.js, or programs in other programming languages (the C, C++, Java, Go, Rust, C# etc. TensorFlow APIs).

(Highlights are my own)


Tensorflow < 2


From the docs:

Save

# Create some variables.
v1 = tf.get_variable("v1", shape=[3], initializer = tf.zeros_initializer)
v2 = tf.get_variable("v2", shape=[5], initializer = tf.zeros_initializer)

inc_v1 = v1.assign(v1+1)
dec_v2 = v2.assign(v2-1)

# Add an op to initialize the variables.
init_op = tf.global_variables_initializer()

# Add ops to save and restore all the variables.
saver = tf.train.Saver()

# Later, launch the model, initialize the variables, do some work, and save the
# variables to disk.
with tf.Session() as sess:
  sess.run(init_op)
  # Do some work with the model.
  inc_v1.op.run()
  dec_v2.op.run()
  # Save the variables to disk.
  save_path = saver.save(sess, "/tmp/model.ckpt")
  print("Model saved in path: %s" % save_path)

Restore

tf.reset_default_graph()

# Create some variables.
v1 = tf.get_variable("v1", shape=[3])
v2 = tf.get_variable("v2", shape=[5])

# Add ops to save and restore all the variables.
saver = tf.train.Saver()

# Later, launch the model, use the saver to restore variables from disk, and
# do some work with the model.
with tf.Session() as sess:
  # Restore variables from disk.
  saver.restore(sess, "/tmp/model.ckpt")
  print("Model restored.")
  # Check the values of the variables
  print("v1 : %s" % v1.eval())
  print("v2 : %s" % v2.eval())

simple_save

Many good answer, for completeness I'll add my 2 cents: simple_save. Also a standalone code example using the tf.data.Dataset API.

Python 3 ; Tensorflow 1.14

import tensorflow as tf
from tensorflow.saved_model import tag_constants

with tf.Graph().as_default():
    with tf.Session() as sess:
        ...

        # Saving
        inputs = {
            "batch_size_placeholder": batch_size_placeholder,
            "features_placeholder": features_placeholder,
            "labels_placeholder": labels_placeholder,
        }
        outputs = {"prediction": model_output}
        tf.saved_model.simple_save(
            sess, 'path/to/your/location/', inputs, outputs
        )

Restoring:

graph = tf.Graph()
with restored_graph.as_default():
    with tf.Session() as sess:
        tf.saved_model.loader.load(
            sess,
            [tag_constants.SERVING],
            'path/to/your/location/',
        )
        batch_size_placeholder = graph.get_tensor_by_name('batch_size_placeholder:0')
        features_placeholder = graph.get_tensor_by_name('features_placeholder:0')
        labels_placeholder = graph.get_tensor_by_name('labels_placeholder:0')
        prediction = restored_graph.get_tensor_by_name('dense/BiasAdd:0')

        sess.run(prediction, feed_dict={
            batch_size_placeholder: some_value,
            features_placeholder: some_other_value,
            labels_placeholder: another_value
        })

Standalone example

Original blog post

The following code generates random data for the sake of the demonstration.

  1. We start by creating the placeholders. They will hold the data at runtime. From them, we create the Dataset and then its Iterator. We get the iterator's generated tensor, called input_tensor which will serve as input to our model.
  2. The model itself is built from input_tensor: a GRU-based bidirectional RNN followed by a dense classifier. Because why not.
  3. The loss is a softmax_cross_entropy_with_logits, optimized with Adam. After 2 epochs (of 2 batches each), we save the "trained" model with tf.saved_model.simple_save. If you run the code as is, then the model will be saved in a folder called simple/ in your current working directory.
  4. In a new graph, we then restore the saved model with tf.saved_model.loader.load. We grab the placeholders and logits with graph.get_tensor_by_name and the Iterator initializing operation with graph.get_operation_by_name.
  5. Lastly we run an inference for both batches in the dataset, and check that the saved and restored model both yield the same values. They do!

Code:

import os
import shutil
import numpy as np
import tensorflow as tf
from tensorflow.python.saved_model import tag_constants


def model(graph, input_tensor):
    """Create the model which consists of
    a bidirectional rnn (GRU(10)) followed by a dense classifier

    Args:
        graph (tf.Graph): Tensors' graph
        input_tensor (tf.Tensor): Tensor fed as input to the model

    Returns:
        tf.Tensor: the model's output layer Tensor
    """
    cell = tf.nn.rnn_cell.GRUCell(10)
    with graph.as_default():
        ((fw_outputs, bw_outputs), (fw_state, bw_state)) = tf.nn.bidirectional_dynamic_rnn(
            cell_fw=cell,
            cell_bw=cell,
            inputs=input_tensor,
            sequence_length=[10] * 32,
            dtype=tf.float32,
            swap_memory=True,
            scope=None)
        outputs = tf.concat((fw_outputs, bw_outputs), 2)
        mean = tf.reduce_mean(outputs, axis=1)
        dense = tf.layers.dense(mean, 5, activation=None)

        return dense


def get_opt_op(graph, logits, labels_tensor):
    """Create optimization operation from model's logits and labels

    Args:
        graph (tf.Graph): Tensors' graph
        logits (tf.Tensor): The model's output without activation
        labels_tensor (tf.Tensor): Target labels

    Returns:
        tf.Operation: the operation performing a stem of Adam optimizer
    """
    with graph.as_default():
        with tf.variable_scope('loss'):
            loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
                    logits=logits, labels=labels_tensor, name='xent'),
                    name="mean-xent"
                    )
        with tf.variable_scope('optimizer'):
            opt_op = tf.train.AdamOptimizer(1e-2).minimize(loss)
        return opt_op


if __name__ == '__main__':
    # Set random seed for reproducibility
    # and create synthetic data
    np.random.seed(0)
    features = np.random.randn(64, 10, 30)
    labels = np.eye(5)[np.random.randint(0, 5, (64,))]

    graph1 = tf.Graph()
    with graph1.as_default():
        # Random seed for reproducibility
        tf.set_random_seed(0)
        # Placeholders
        batch_size_ph = tf.placeholder(tf.int64, name='batch_size_ph')
        features_data_ph = tf.placeholder(tf.float32, [None, None, 30], 'features_data_ph')
        labels_data_ph = tf.placeholder(tf.int32, [None, 5], 'labels_data_ph')
        # Dataset
        dataset = tf.data.Dataset.from_tensor_slices((features_data_ph, labels_data_ph))
        dataset = dataset.batch(batch_size_ph)
        iterator = tf.data.Iterator.from_structure(dataset.output_types, dataset.output_shapes)
        dataset_init_op = iterator.make_initializer(dataset, name='dataset_init')
        input_tensor, labels_tensor = iterator.get_next()

        # Model
        logits = model(graph1, input_tensor)
        # Optimization
        opt_op = get_opt_op(graph1, logits, labels_tensor)

        with tf.Session(graph=graph1) as sess:
            # Initialize variables
            tf.global_variables_initializer().run(session=sess)
            for epoch in range(3):
                batch = 0
                # Initialize dataset (could feed epochs in Dataset.repeat(epochs))
                sess.run(
                    dataset_init_op,
                    feed_dict={
                        features_data_ph: features,
                        labels_data_ph: labels,
                        batch_size_ph: 32
                    })
                values = []
                while True:
                    try:
                        if epoch < 2:
                            # Training
                            _, value = sess.run([opt_op, logits])
                            print('Epoch {}, batch {} | Sample value: {}'.format(epoch, batch, value[0]))
                            batch += 1
                        else:
                            # Final inference
                            values.append(sess.run(logits))
                            print('Epoch {}, batch {} | Final inference | Sample value: {}'.format(epoch, batch, values[-1][0]))
                            batch += 1
                    except tf.errors.OutOfRangeError:
                        break
            # Save model state
            print('\nSaving...')
            cwd = os.getcwd()
            path = os.path.join(cwd, 'simple')
            shutil.rmtree(path, ignore_errors=True)
            inputs_dict = {
                "batch_size_ph": batch_size_ph,
                "features_data_ph": features_data_ph,
                "labels_data_ph": labels_data_ph
            }
            outputs_dict = {
                "logits": logits
            }
            tf.saved_model.simple_save(
                sess, path, inputs_dict, outputs_dict
            )
            print('Ok')
    # Restoring
    graph2 = tf.Graph()
    with graph2.as_default():
        with tf.Session(graph=graph2) as sess:
            # Restore saved values
            print('\nRestoring...')
            tf.saved_model.loader.load(
                sess,
                [tag_constants.SERVING],
                path
            )
            print('Ok')
            # Get restored placeholders
            labels_data_ph = graph2.get_tensor_by_name('labels_data_ph:0')
            features_data_ph = graph2.get_tensor_by_name('features_data_ph:0')
            batch_size_ph = graph2.get_tensor_by_name('batch_size_ph:0')
            # Get restored model output
            restored_logits = graph2.get_tensor_by_name('dense/BiasAdd:0')
            # Get dataset initializing operation
            dataset_init_op = graph2.get_operation_by_name('dataset_init')

            # Initialize restored dataset
            sess.run(
                dataset_init_op,
                feed_dict={
                    features_data_ph: features,
                    labels_data_ph: labels,
                    batch_size_ph: 32
                }

            )
            # Compute inference for both batches in dataset
            restored_values = []
            for i in range(2):
                restored_values.append(sess.run(restored_logits))
                print('Restored values: ', restored_values[i][0])

    # Check if original inference and restored inference are equal
    valid = all((v == rv).all() for v, rv in zip(values, restored_values))
    print('\nInferences match: ', valid)

This will print:

$ python3 save_and_restore.py

Epoch 0, batch 0 | Sample value: [-0.13851789 -0.3087595   0.12804556  0.20013677 -0.08229901]
Epoch 0, batch 1 | Sample value: [-0.00555491 -0.04339041 -0.05111827 -0.2480045  -0.00107776]
Epoch 1, batch 0 | Sample value: [-0.19321944 -0.2104792  -0.00602257  0.07465433  0.11674127]
Epoch 1, batch 1 | Sample value: [-0.05275984  0.05981954 -0.15913513 -0.3244143   0.10673307]
Epoch 2, batch 0 | Final inference | Sample value: [-0.26331693 -0.13013336 -0.12553    -0.04276478  0.2933622 ]
Epoch 2, batch 1 | Final inference | Sample value: [-0.07730117  0.11119192 -0.20817074 -0.35660955  0.16990358]

Saving...
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:No assets to write.
INFO:tensorflow:SavedModel written to: b'/some/path/simple/saved_model.pb'
Ok

Restoring...
INFO:tensorflow:Restoring parameters from b'/some/path/simple/variables/variables'
Ok
Restored values:  [-0.26331693 -0.13013336 -0.12553    -0.04276478  0.2933622 ]
Restored values:  [-0.07730117  0.11119192 -0.20817074 -0.35660955  0.16990358]

Inferences match:  True
ted
  • 13,596
  • 9
  • 65
  • 107
  • 1
    I'm beginner and I need more explanation... : If I have a CNN model, should I store just 1. inputs_placeholder 2. labels_placeholder, and 3. output_of_cnn? Or all the intermediate `tf.contrib.layers`? – NeoZoom.lua Jun 16 '18 at 11:43
  • 3
    The graph is entirely restored. You could check it running `[n.name for n in graph2.as_graph_def().node]`. As the documentation says, simple save is aimed at simplifying the interaction with tensorflow serving, this is the point of the arguments; other variables are however still restored, otherwise inference would not happen. Just grab your variables of interest as I did in the example. Check out the [**documentation**](https://www.tensorflow.org/api_docs/python/tf/saved_model/simple_save) – ted Jun 16 '18 at 12:25
  • @ted when would I use tf.saved_model.simple_save vs tf.train.Saver()? From my intuition I would use tf.train.Saver() during training and to store different moments in time. I would use tf.saved_model.simple_save when training is done to use in production. (I asked the same also in a comment [here](https://stackoverflow.com/questions/52132450/in-tensorflows-low-level-api-is-it-possible-to-save-a-graph-with-an-optimizer/52132527#comment91303149_52132527)) – loco.loop Sep 05 '18 at 03:02
  • @ted hey, thanks for the reply... just checking if "yes I'd say it's a good way to go" was a response to my comment... lol – loco.loop Sep 06 '18 at 02:10
  • I get an error `KeyError: u'ImageProjectiveTransform'` when loading the model again afterwards. I saved all the input placeholders and output logits. I used `tf.contrib.image.rotate` as a Image augmentation technique but didn't add that to the dicts while saving the model. Is that the source of the error? – Siladittya Sep 10 '18 at 16:20
  • Probably. If not you should ask a standalone question – ted Sep 11 '18 at 06:07
  • I asked https://stackoverflow.com/questions/52260945/keyerror-uimageprojectivetransform-when-loading-tensorflow-model No one answered – Siladittya Sep 11 '18 at 06:49
  • 2
    Nice I guess, but does it also work with Eager mode models and tfe.Saver? – Geoffrey Anderson Sep 26 '18 at 18:47
  • simple_save is now deprecated: https://www.tensorflow.org/api_docs/python/tf/saved_model/simple_save – Overdrivr Mar 11 '19 at 09:29
  • @Overdrivr: what's the recommended way of doing it then? – eugene Mar 14 '19 at 08:33
  • 1
    without `global_step` as an argument, if you stop then try to pick up training again, it will think you are one step one. It will screw up your tensorboard visualizations at the very least – Monica Heddneck Apr 20 '19 at 01:00
  • @ted I just saw some real strange code that I couldn't figure out. 'tf.train.Saver(self.network.vars).restore(self.sess, trained_model_path)', the self.network.vars returns '[var for var in tf.global_variables() if self.name in var.name]' – June Wang Aug 25 '19 at 13:07
  • Does this solution work for stateful models? I have a stateful LSTM, and I am using a similar solution. After I get a new window, I give the model a new window and expect the model to remember the states from the previous windows. In inference, it looks like it is not remembering the LSTM states and is resetting at each call. – mickey Feb 12 '20 at 16:00
  • 2
    I'm trying to call restore and getting this error `ValueError: No variables to save`. Can anyone help? – Elaine Chen Mar 09 '20 at 20:38
  • How would one save/restore the optimizer as well used if training the model? If simply using the above method to restore, the optimizer does not start from where training previously ended. I've opened a new question [here](https://stackoverflow.com/questions/62198840/save-and-load-custom-optimizers-for-continued-training-in-tensorflow) but perhaps it's apt to also add as a solution here. – Mathews24 Jun 06 '20 at 14:02
  • You said "exhaustive and useful tutorial"... I gave that tutorial 1 star because it's exhaustive and useless :) – Bersan Jun 09 '20 at 10:08
  • where is ``restored_graph``? in the example tensorflow < 2? – ArtificiallyIntelligence Sep 22 '20 at 20:21
  • @ElaineChen Same error `ValueError: No variables to save`, this helps: `sess = tf.Session() saver = tf.train.import_meta_graph('model_ckpt_dir/model.ckpt.meta') saver.restore(sess, 'model_ckpt_dir/model.ckpt')` – mrgloom Oct 06 '20 at 10:10
  • @ted In the `SavedModel` format that is suppose to work for C, the `pb` file does not pull up in ML.NET. Here is a link to my [inquiry](https://stackoverflow.com/questions/64794378/correct-pb-file-to-move-tensorflow-model-into-ml-net) on Overflow if you have any suggestions. Thanks. – Josh Nov 25 '20 at 17:33
  • In tf2 eager mode, how do you restore only some of the variables from a checkpoint, using the tf.train.Checkpoint syntax presented above? I dug through the implementation but got stumped by the TrackableSaver class, which has very different api from tf.train.Saver, and doesn't seem to allow passing in a list of variables to restore. – John Jiang Dec 01 '21 at 19:34
130

For TensorFlow version < 0.11.0RC1:

The checkpoints that are saved contain values for the Variables in your model, not the model/graph itself, which means that the graph should be the same when you restore the checkpoint.

Here's an example for a linear regression where there's a training loop that saves variable checkpoints and an evaluation section that will restore variables saved in a prior run and compute predictions. Of course, you can also restore variables and continue training if you'd like.

x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32)

w = tf.Variable(tf.zeros([1, 1], dtype=tf.float32))
b = tf.Variable(tf.ones([1, 1], dtype=tf.float32))
y_hat = tf.add(b, tf.matmul(x, w))

...more setup for optimization and what not...

saver = tf.train.Saver()  # defaults to saving all variables - in this case w and b

with tf.Session() as sess:
    sess.run(tf.initialize_all_variables())
    if FLAGS.train:
        for i in xrange(FLAGS.training_steps):
            ...training loop...
            if (i + 1) % FLAGS.checkpoint_steps == 0:
                saver.save(sess, FLAGS.checkpoint_dir + 'model.ckpt',
                           global_step=i+1)
    else:
        # Here's where you're restoring the variables w and b.
        # Note that the graph is exactly as it was when the variables were
        # saved in a prior training run.
        ckpt = tf.train.get_checkpoint_state(FLAGS.checkpoint_dir)
        if ckpt and ckpt.model_checkpoint_path:
            saver.restore(sess, ckpt.model_checkpoint_path)
        else:
            ...no checkpoint found...

        # Now you can run the model to get predictions
        batch_x = ...load some data...
        predictions = sess.run(y_hat, feed_dict={x: batch_x})

Here are the docs for Variables, which cover saving and restoring. And here are the docs for the Saver.

abhinonymous
  • 329
  • 2
  • 13
Ryan Sepassi
  • 1,501
  • 2
  • 10
  • 5
  • 1
    FLAGS are user-defined. Here's an example of defining them: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/tutorials/mnist/fully_connected_feed.py – Ryan Sepassi Mar 26 '16 at 01:19
  • in which format does `batch_x` need to be? Binary? Numpy array? – pepe Jun 05 '16 at 16:27
  • @pepe Numpy arrary should be fine. And the element's type should correspond to the type of the placeholder. [link]https://www.tensorflow.org/versions/r0.9/api_docs/python/framework.html#tensor-types – Donny Jun 09 '16 at 16:46
  • FLAGS gives error `undefined`. Can you tell me which is def of FLAGS for this code. @RyanSepassi – Muhammad Hannan Dec 09 '16 at 20:36
  • To make it explicit: Recent versions of Tensorflow **do** allow to store the model/graph. [It was unclear to me, which aspects of the answer apply to the <0.11 constraint. Given the large number of upvotes I was tempted to believe that this general statement is still true for recent versions.] – bluenote10 Apr 18 '17 at 16:27
  • Hi, been a while since this post, but can you tell me what y_hat is - my model is saved as ckpt, meta and data files and Im training an image segmentation model, the output is segmented boxes as opposed to a variable y_hat. – El_1988 Jan 22 '20 at 13:26
  • How would one save/restore the optimizer as well used if training the model? If simply using the above method to restore, the optimizer does not start from where training previously ended. I've opened a new question [here](https://stackoverflow.com/questions/62198840/save-and-load-custom-optimizers-for-continued-training-in-tensorflow) but perhaps it's apt to also add as a solution here. – Mathews24 Jun 06 '20 at 14:04
84

My environment: Python 3.6, Tensorflow 1.3.0

Though there have been many solutions, most of them is based on tf.train.Saver. When we load a .ckpt saved by Saver, we have to either redefine the tensorflow network or use some weird and hard-remembered name, e.g. 'placehold_0:0','dense/Adam/Weight:0'. Here I recommend to use tf.saved_model, one simplest example given below, your can learn more from Serving a TensorFlow Model:

Save the model:

import tensorflow as tf

# define the tensorflow network and do some trains
x = tf.placeholder("float", name="x")
w = tf.Variable(2.0, name="w")
b = tf.Variable(0.0, name="bias")

h = tf.multiply(x, w)
y = tf.add(h, b, name="y")
sess = tf.Session()
sess.run(tf.global_variables_initializer())

# save the model
export_path =  './savedmodel'
builder = tf.saved_model.builder.SavedModelBuilder(export_path)

tensor_info_x = tf.saved_model.utils.build_tensor_info(x)
tensor_info_y = tf.saved_model.utils.build_tensor_info(y)

prediction_signature = (
  tf.saved_model.signature_def_utils.build_signature_def(
      inputs={'x_input': tensor_info_x},
      outputs={'y_output': tensor_info_y},
      method_name=tf.saved_model.signature_constants.PREDICT_METHOD_NAME))

builder.add_meta_graph_and_variables(
  sess, [tf.saved_model.tag_constants.SERVING],
  signature_def_map={
      tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY:
          prediction_signature 
  },
  )
builder.save()

Load the model:

import tensorflow as tf
sess=tf.Session() 
signature_key = tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY
input_key = 'x_input'
output_key = 'y_output'

export_path =  './savedmodel'
meta_graph_def = tf.saved_model.loader.load(
           sess,
          [tf.saved_model.tag_constants.SERVING],
          export_path)
signature = meta_graph_def.signature_def

x_tensor_name = signature[signature_key].inputs[input_key].name
y_tensor_name = signature[signature_key].outputs[output_key].name

x = sess.graph.get_tensor_by_name(x_tensor_name)
y = sess.graph.get_tensor_by_name(y_tensor_name)

y_out = sess.run(y, {x: 3.0})
Mike S.
  • 118
  • 7
William
  • 4,258
  • 2
  • 23
  • 20
  • 4
    +1 for a great example of the SavedModel API. However, I wish your **Save the model** section showed a training loop like Ryan Sepassi's answer! I realize this is an old question, but this response is one of the few (and valuable) examples of SavedModel I found on Google. – Dylan F Dec 26 '17 at 03:07
  • @Tom this is a great answer - only one aimed at the new SavedModel. Could you have a look at this SavedModel question? https://stackoverflow.com/questions/48540744/tensorflow-savedmodel-how-to-iterative-save – bluesummers Feb 11 '18 at 15:10
  • Now make it all work correctly with TF Eager models. Google advised in their 2018 presentation for everyone to get away from TF graph code. – Geoffrey Anderson Sep 26 '18 at 18:50
55

There are two parts to the model, the model definition, saved by Supervisor as graph.pbtxt in the model directory and the numerical values of tensors, saved into checkpoint files like model.ckpt-1003418.

The model definition can be restored using tf.import_graph_def, and the weights are restored using Saver.

However, Saver uses special collection holding list of variables that's attached to the model Graph, and this collection is not initialized using import_graph_def, so you can't use the two together at the moment (it's on our roadmap to fix). For now, you have to use approach of Ryan Sepassi -- manually construct a graph with identical node names, and use Saver to load the weights into it.

(Alternatively you could hack it by using by using import_graph_def, creating variables manually, and using tf.add_to_collection(tf.GraphKeys.VARIABLES, variable) for each variable, then using Saver)

David Silva-Barrera
  • 1,006
  • 8
  • 12
Yaroslav Bulatov
  • 57,332
  • 22
  • 139
  • 197
  • In the classify_image.py example that uses inceptionv3, only the graphdef is loaded. Does it mean that now the GraphDef also contains the Variable ? – jrabary Feb 05 '16 at 20:42
  • 1
    @jrabary The model has probably been [frozen](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py). – Eric Platon Mar 21 '16 at 02:27
  • 1
    Hey, I'm new to tensorflow and am having trouble saving my model. I would really appreciate it if you could help me https://stackoverflow.com/questions/48083474/finish-tensorflow-training-in-progress – Ruchir Baronia Jan 03 '18 at 18:58
38

You can also take this easier way.

Step 1: initialize all your variables

W1 = tf.Variable(tf.truncated_normal([6, 6, 1, K], stddev=0.1), name="W1")
B1 = tf.Variable(tf.constant(0.1, tf.float32, [K]), name="B1")

Similarly, W2, B2, W3, .....

Step 2: save the session inside model Saver and save it

model_saver = tf.train.Saver()

# Train the model and save it in the end
model_saver.save(session, "saved_models/CNN_New.ckpt")

Step 3: restore the model

with tf.Session(graph=graph_cnn) as session:
    model_saver.restore(session, "saved_models/CNN_New.ckpt")
    print("Model restored.") 
    print('Initialized')

Step 4: check your variable

W1 = session.run(W1)
print(W1)

While running in different python instance, use

with tf.Session() as sess:
    # Restore latest checkpoint
    saver.restore(sess, tf.train.latest_checkpoint('saved_model/.'))

    # Initalize the variables
    sess.run(tf.global_variables_initializer())

    # Get default graph (supply your custom graph if you have one)
    graph = tf.get_default_graph()

    # It will give tensor object
    W1 = graph.get_tensor_by_name('W1:0')

    # To get the value (numpy array)
    W1_value = session.run(W1)
nbro
  • 15,395
  • 32
  • 113
  • 196
Himanshu Babal
  • 645
  • 11
  • 18
  • Hi, How can I save the model after suppose 3000 iterations, similar to Caffe. I found out that tensorflow save only last models despite that I concatenate iteration number with model to differentiate it among all iterations. I mean model_3000.ckpt, model_6000.ckpt, --- model_100000.ckpt. Can you kindly explain why it doesn't save all rather saves only last 3 iterations. – khan Apr 04 '17 at 10:32
  • 2
    @khan see http://stackoverflow.com/questions/38265061/tensorflow-missing-checkpoint-files-does-saver-only-allow-for-keeping-5-check – Himanshu Babal Apr 14 '17 at 21:28
  • 3
    Is there a method to get all the variables/operation names saved within the graph? – Moondra Oct 11 '17 at 17:36
21

In most cases, saving and restoring from disk using a tf.train.Saver is your best option:

... # build your model
saver = tf.train.Saver()

with tf.Session() as sess:
    ... # train the model
    saver.save(sess, "/tmp/my_great_model")

with tf.Session() as sess:
    saver.restore(sess, "/tmp/my_great_model")
    ... # use the model

You can also save/restore the graph structure itself (see the MetaGraph documentation for details). By default, the Saver saves the graph structure into a .meta file. You can call import_meta_graph() to restore it. It restores the graph structure and returns a Saver that you can use to restore the model's state:

saver = tf.train.import_meta_graph("/tmp/my_great_model.meta")

with tf.Session() as sess:
    saver.restore(sess, "/tmp/my_great_model")
    ... # use the model

However, there are cases where you need something much faster. For example, if you implement early stopping, you want to save checkpoints every time the model improves during training (as measured on the validation set), then if there is no progress for some time, you want to roll back to the best model. If you save the model to disk every time it improves, it will tremendously slow down training. The trick is to save the variable states to memory, then just restore them later:

... # build your model

# get a handle on the graph nodes we need to save/restore the model
graph = tf.get_default_graph()
gvars = graph.get_collection(tf.GraphKeys.GLOBAL_VARIABLES)
assign_ops = [graph.get_operation_by_name(v.op.name + "/Assign") for v in gvars]
init_values = [assign_op.inputs[1] for assign_op in assign_ops]

with tf.Session() as sess:
    ... # train the model

    # when needed, save the model state to memory
    gvars_state = sess.run(gvars)

    # when needed, restore the model state
    feed_dict = {init_value: val
                 for init_value, val in zip(init_values, gvars_state)}
    sess.run(assign_ops, feed_dict=feed_dict)

A quick explanation: when you create a variable X, TensorFlow automatically creates an assignment operation X/Assign to set the variable's initial value. Instead of creating placeholders and extra assignment ops (which would just make the graph messy), we just use these existing assignment ops. The first input of each assignment op is a reference to the variable it is supposed to initialize, and the second input (assign_op.inputs[1]) is the initial value. So in order to set any value we want (instead of the initial value), we need to use a feed_dict and replace the initial value. Yes, TensorFlow lets you feed a value for any op, not just for placeholders, so this works fine.

nbro
  • 15,395
  • 32
  • 113
  • 196
MiniQuark
  • 46,633
  • 36
  • 147
  • 183
  • Thanks for the answer. I have got a similar question on how to convert a single .ckpt file to two .index and .data file (say for pre-trained inception models available on tf.slim). My question is here: https://stackoverflow.com/questions/47762114/converting-a-pb-file-to-meta-in-tf-1-3 – Amir Dec 12 '17 at 09:38
  • Hey, I'm new to tensorflow and am having trouble saving my model. I would really appreciate it if you could help me https://stackoverflow.com/questions/48083474/finish-tensorflow-training-in-progress – Ruchir Baronia Jan 03 '18 at 18:59
17

As Yaroslav said, you can hack restoring from a graph_def and checkpoint by importing the graph, manually creating variables, and then using a Saver.

I implemented this for my personal use, so I though I'd share the code here.

Link: https://gist.github.com/nikitakit/6ef3b72be67b86cb7868

(This is, of course, a hack, and there is no guarantee that models saved this way will remain readable in future versions of TensorFlow.)

nikitakit
  • 183
  • 6
14

If it is an internally saved model, you just specify a restorer for all variables as

restorer = tf.train.Saver(tf.all_variables())

and use it to restore variables in a current session:

restorer.restore(self._sess, model_file)

For the external model you need to specify the mapping from the its variable names to your variable names. You can view the model variable names using the command

python /path/to/tensorflow/tensorflow/python/tools/inspect_checkpoint.py --file_name=/path/to/pretrained_model/model.ckpt

The inspect_checkpoint.py script can be found in './tensorflow/python/tools' folder of the Tensorflow source.

To specify the mapping, you can use my Tensorflow-Worklab, which contains a set of classes and scripts to train and retrain different models. It includes an example of retraining ResNet models, located here

Sergey Demyanov
  • 2,655
  • 1
  • 17
  • 9
  • `all_variables()` is now deprecated – MiniQuark May 31 '17 at 20:08
  • Hey, I'm new to tensorflow and am having trouble saving my model. I would really appreciate it if you could help me https://stackoverflow.com/questions/48083474/finish-tensorflow-training-in-progress – Ruchir Baronia Jan 03 '18 at 19:02
12

Here's my simple solution for the two basic cases differing on whether you want to load the graph from file or build it during runtime.

This answer holds for Tensorflow 0.12+ (including 1.0).

Rebuilding the graph in code

Saving

graph = ... # build the graph
saver = tf.train.Saver()  # create the saver after the graph
with ... as sess:  # your session object
    saver.save(sess, 'my-model')

Loading

graph = ... # build the graph
saver = tf.train.Saver()  # create the saver after the graph
with ... as sess:  # your session object
    saver.restore(sess, tf.train.latest_checkpoint('./'))
    # now you can use the graph, continue training or whatever

Loading also the graph from a file

When using this technique, make sure all your layers/variables have explicitly set unique names. Otherwise Tensorflow will make the names unique itself and they'll be thus different from the names stored in the file. It's not a problem in the previous technique, because the names are "mangled" the same way in both loading and saving.

Saving

graph = ... # build the graph

for op in [ ... ]:  # operators you want to use after restoring the model
    tf.add_to_collection('ops_to_restore', op)

saver = tf.train.Saver()  # create the saver after the graph
with ... as sess:  # your session object
    saver.save(sess, 'my-model')

Loading

with ... as sess:  # your session object
    saver = tf.train.import_meta_graph('my-model.meta')
    saver.restore(sess, tf.train.latest_checkpoint('./'))
    ops = tf.get_collection('ops_to_restore')  # here are your operators in the same order in which you saved them to the collection
Martin Pecka
  • 2,953
  • 1
  • 31
  • 40
  • -1 Starting your answer by dismissing "all other answers here" is a bit harsh. That said, I downvoted for other reasons: you should definitely save all global variables, not just the trainable variables. For example, the `global_step` variable and the moving averages of batch normalization are non-trainable variables, but both are definitely worth saving. Also, you should more clearly distinguish the construction of the graph from running the session, for example `Saver(...).save()` will create new nodes every time you run it. Probably not what you want. And there's more... :/ – MiniQuark May 31 '17 at 19:54
  • @MiniQuark ok, thanks for your feedback, I'll edit the answer according to your suggestions ;) – Martin Pecka Jun 01 '17 at 08:10
11

tf.keras Model saving with TF2.0

I see great answers for saving models using TF1.x. I want to provide couple of more pointers in saving tensorflow.keras models which is a little complicated as there are many ways to save a model.

Here I am providing an example of saving a tensorflow.keras model to model_path folder under current directory. This works well with most recent tensorflow (TF2.0). I will update this description if there is any change in near future.

Saving and loading entire model

import tensorflow as tf
from tensorflow import keras
mnist = tf.keras.datasets.mnist

#import data
(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

# create a model
def create_model():
  model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(512, activation=tf.nn.relu),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10, activation=tf.nn.softmax)
    ])
# compile the model
  model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])
  return model

# Create a basic model instance
model=create_model()

model.fit(x_train, y_train, epochs=1)
loss, acc = model.evaluate(x_test, y_test,verbose=1)
print("Original model, accuracy: {:5.2f}%".format(100*acc))

# Save entire model to a HDF5 file
model.save('./model_path/my_model.h5')

# Recreate the exact same model, including weights and optimizer.
new_model = keras.models.load_model('./model_path/my_model.h5')
loss, acc = new_model.evaluate(x_test, y_test)
print("Restored model, accuracy: {:5.2f}%".format(100*acc))

Saving and loading model Weights only

If you are interested in saving model weights only and then load weights to restore the model, then

model.fit(x_train, y_train, epochs=5)
loss, acc = model.evaluate(x_test, y_test,verbose=1)
print("Original model, accuracy: {:5.2f}%".format(100*acc))

# Save the weights
model.save_weights('./checkpoints/my_checkpoint')

# Restore the weights
model = create_model()
model.load_weights('./checkpoints/my_checkpoint')

loss,acc = model.evaluate(x_test, y_test)
print("Restored model, accuracy: {:5.2f}%".format(100*acc))

Saving and restoring using keras checkpoint callback

# include the epoch in the file name. (uses `str.format`)
checkpoint_path = "training_2/cp-{epoch:04d}.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)

cp_callback = tf.keras.callbacks.ModelCheckpoint(
    checkpoint_path, verbose=1, save_weights_only=True,
    # Save weights, every 5-epochs.
    period=5)

model = create_model()
model.save_weights(checkpoint_path.format(epoch=0))
model.fit(train_images, train_labels,
          epochs = 50, callbacks = [cp_callback],
          validation_data = (test_images,test_labels),
          verbose=0)

latest = tf.train.latest_checkpoint(checkpoint_dir)

new_model = create_model()
new_model.load_weights(latest)
loss, acc = new_model.evaluate(test_images, test_labels)
print("Restored model, accuracy: {:5.2f}%".format(100*acc))

saving model with custom metrics

import tensorflow as tf
from tensorflow import keras
mnist = tf.keras.datasets.mnist

(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

# Custom Loss1 (for example) 
@tf.function() 
def customLoss1(yTrue,yPred):
  return tf.reduce_mean(yTrue-yPred) 

# Custom Loss2 (for example) 
@tf.function() 
def customLoss2(yTrue, yPred):
  return tf.reduce_mean(tf.square(tf.subtract(yTrue,yPred))) 
  
def create_model():
  model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(512, activation=tf.nn.relu),  
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10, activation=tf.nn.softmax)
    ])
  model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy', customLoss1, customLoss2])
  return model

# Create a basic model instance
model=create_model()

# Fit and evaluate model 
model.fit(x_train, y_train, epochs=1)
loss, acc,loss1, loss2 = model.evaluate(x_test, y_test,verbose=1)
print("Original model, accuracy: {:5.2f}%".format(100*acc))

model.save("./model.h5")

new_model=tf.keras.models.load_model("./model.h5",custom_objects={'customLoss1':customLoss1,'customLoss2':customLoss2})

Saving keras model with custom ops

When we have custom ops as in the following case (tf.tile), we need to create a function and wrap with a Lambda layer. Otherwise, model cannot be saved.

import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Input, Lambda
from tensorflow.keras import Model

def my_fun(a):
  out = tf.tile(a, (1, tf.shape(a)[0]))
  return out

a = Input(shape=(10,))
#out = tf.tile(a, (1, tf.shape(a)[0]))
out = Lambda(lambda x : my_fun(x))(a)
model = Model(a, out)

x = np.zeros((50,10), dtype=np.float32)
print(model(x).numpy())

model.save('my_model.h5')

#load the model
new_model=tf.keras.models.load_model("my_model.h5")

I think I have covered a few of the many ways of saving tf.keras model. However, there are many other ways. Please comment below if you see your use case is not covered above.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Vishnuvardhan Janapati
  • 3,088
  • 1
  • 16
  • 25
9

If you use tf.train.MonitoredTrainingSession as the default session, you don't need to add extra code to do save/restore things. Just pass a checkpoint dir name to MonitoredTrainingSession's constructor, it will use session hooks to handle these.

Changming Sun
  • 857
  • 2
  • 7
  • 19
  • using [tf.train.Supervisor](https://www.tensorflow.org/programmers_guide/supervisor) will handle creating such a session for you, and provides a more complete solution. – Mark Jun 06 '17 at 20:23
  • 1
    @Mark tf.train.Supervisor is deprecated – Changming Sun Jun 08 '17 at 06:27
  • Do you have any link supporting the claim that Supervisor is deprecated? I didn't see anything that indicates this to be the case. – Mark Jun 08 '17 at 12:59
  • @Mark https://stackoverflow.com/questions/41643044/what-is-the-difference-between-tf-train-monitoredtrainingsession-and-tf-train-su – Changming Sun Jun 09 '17 at 10:59
  • Thanks for the URL -- I checked with the original source of the information, and was told it will probably be around until the end of the TF 1.x series, but no guarantees after that. – Mark Jun 09 '17 at 17:54
8

All the answers here are great, but I want to add two things.

First, to elaborate on @user7505159's answer, the "./" can be important to add to the beginning of the file name that you are restoring.

For example, you can save a graph with no "./" in the file name like so:

# Some graph defined up here with specific names

saver = tf.train.Saver()
save_file = 'model.ckpt'

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    saver.save(sess, save_file)

But in order to restore the graph, you may need to prepend a "./" to the file_name:

# Same graph defined up here

saver = tf.train.Saver()
save_file = './' + 'model.ckpt' # String addition used for emphasis

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    saver.restore(sess, save_file)

You will not always need the "./", but it can cause problems depending on your environment and version of TensorFlow.

It also want to mention that the sess.run(tf.global_variables_initializer()) can be important before restoring the session.

If you are receiving an error regarding uninitialized variables when trying to restore a saved session, make sure you include sess.run(tf.global_variables_initializer()) before the saver.restore(sess, save_file) line. It can save you a headache.

saetch_g
  • 1,427
  • 10
  • 10
8

According to the new Tensorflow version, tf.train.Checkpoint is the preferable way of saving and restoring a model:

Checkpoint.save and Checkpoint.restore write and read object-based checkpoints, in contrast to tf.train.Saver which writes and reads variable.name based checkpoints. Object-based checkpointing saves a graph of dependencies between Python objects (Layers, Optimizers, Variables, etc.) with named edges, and this graph is used to match variables when restoring a checkpoint. It can be more robust to changes in the Python program, and helps to support restore-on-create for variables when executing eagerly. Prefer tf.train.Checkpoint over tf.train.Saver for new code.

Here is an example:

import tensorflow as tf
import os

tf.enable_eager_execution()

checkpoint_directory = "/tmp/training_checkpoints"
checkpoint_prefix = os.path.join(checkpoint_directory, "ckpt")

checkpoint = tf.train.Checkpoint(optimizer=optimizer, model=model)
status = checkpoint.restore(tf.train.latest_checkpoint(checkpoint_directory))
for _ in range(num_training_steps):
  optimizer.minimize( ... )  # Variables will be restored on creation.
status.assert_consumed()  # Optional sanity checks.
checkpoint.save(file_prefix=checkpoint_prefix)

More information and example here.

Amir
  • 16,067
  • 10
  • 80
  • 119
7

As described in issue 6255:

use '**./**model_name.ckpt'
saver.restore(sess,'./my_model_final.ckpt')

instead of

saver.restore('my_model_final.ckpt')
Grisha Levit
  • 8,194
  • 2
  • 38
  • 53
AI4U.ai
  • 71
  • 1
  • 2
7

For tensorflow 2.0, it is as simple as

# Save the model
model.save('path_to_my_model.h5')

To restore:

new_model = tensorflow.keras.models.load_model('path_to_my_model.h5')
serv-inc
  • 35,772
  • 9
  • 166
  • 188
  • What about all the custom tf operations and variables that are not part of the model object? Will they get saved somehow when you call save() on the model? I have various custom loss and tensorflow-probability expressions that are used in the inference and generation network but they are not part of my model. My keras model object only contains the dense and conv layers. In TF 1 I just called the save method and I could be sure that every operations and tensors used in my graph would get saved. In TF2 I don't see how the operations that are not somehow added to the keras model will get saved. – Kristof Aug 29 '19 at 19:11
  • Is there any more info on restoring models in TF 2.0? I can't restore weights from checkpoint files generated via the C api, see: https://stackoverflow.com/questions/57944786/keras-models-load-model-fails-with-tags-train – jregalad Sep 17 '19 at 18:39
  • @jregalad: it's complicated. Maybe my questions at https://stackoverflow.com/questions/56340852/distinguish-types-of-on-disk-models https://stackoverflow.com/questions/55849309/retrain-image-detection-with-mobilenet https://stackoverflow.com/questions/55829593/convert-output-of-retrain-py-to-tensorflow-js https://stackoverflow.com/questions/55829043/difference-between-tfjs-layers-model-and-tfjs-graph-model and https://stackoverflow.com/questions/55490885/error-converting-keras-model-to-tfjs-duplicate-weight-name-variable can help – serv-inc Sep 18 '19 at 07:22
6

For tensorflow-2.0

it's very simple.

import tensorflow as tf

SAVE

model.save("model_name")

RESTORE

model = tf.keras.models.load_model('model_name')
Ashiq Imran
  • 2,077
  • 19
  • 17
5

Tensorflow 2.6 : It has become way more simpler now, you can save model in 2 formats

  1. Saved_model ( tf-serving compatible)
  2. H5 or HDF5

Saving model in both formats :

 from tensorflow.keras import Model
 inputs = tf.keras.Input(shape=(224,224,3))
 y = tf.keras.layers.Conv2D(24, 3, activation='relu', input_shape=input_shape[1:])(inputs)
 outputs = tf.keras.layers.Dense(5, activation=tf.nn.softmax)(y)
 model = tf.keras.Model(inputs=inputs, outputs=outputs)
 model.save("saved_model/my_model") #To Save in Saved_model format
 model.save("my_model.h5") #To save model in H5 or HDF5 format

To load Model in both format

import tensorflow as tf
h5_model = tf.keras.models.load_model("my_model.h5") # loading model in h5 format
h5_model.summary()
saved_m = tf.keras.models.load_model("saved_model/my_model") #loading model in saved_model format
saved_m.summary()
keertika jain
  • 292
  • 2
  • 6
4

Here is a simple example using Tensorflow 2.0 SavedModel format (which is the recommended format, according to the docs) for a simple MNIST dataset classifier, using Keras functional API without too much fancy going on:

# Imports
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense, Flatten
from tensorflow.keras.models import Model
import matplotlib.pyplot as plt

# Load data
mnist = tf.keras.datasets.mnist # 28 x 28
(x_train,y_train), (x_test, y_test) = mnist.load_data()

# Normalize pixels [0,255] -> [0,1]
x_train = tf.keras.utils.normalize(x_train,axis=1)
x_test = tf.keras.utils.normalize(x_test,axis=1)

# Create model
input = Input(shape=(28,28), dtype='float64', name='graph_input')
x = Flatten()(input)
x = Dense(128, activation='relu')(x)
x = Dense(128, activation='relu')(x)
output = Dense(10, activation='softmax', name='graph_output', dtype='float64')(x)
model = Model(inputs=input, outputs=output)

model.compile(optimizer='adam',
             loss='sparse_categorical_crossentropy',
             metrics=['accuracy'])

# Train
model.fit(x_train, y_train, epochs=3)

# Save model in SavedModel format (Tensorflow 2.0)
export_path = 'model'
tf.saved_model.save(model, export_path)

# ... possibly another python program 

# Reload model
loaded_model = tf.keras.models.load_model(export_path) 

# Get image sample for testing
index = 0
img = x_test[index] # I normalized the image on a previous step

# Predict using the signature definition (Tensorflow 2.0)
predict = loaded_model.signatures["serving_default"]
prediction = predict(tf.constant(img))

# Show results
print(np.argmax(prediction['graph_output']))  # prints the class number
plt.imshow(x_test[index], cmap=plt.cm.binary)  # prints the image

What is serving_default?

It's the name of the signature def of the tag you selected (in this case, the default serve tag was selected). Also, here explains how to find the tag's and signatures of a model using saved_model_cli.

Disclaimers

This is just a basic example if you just want to get it up and running, but is by no means a complete answer - maybe I can update it in the future. I just wanted to give a simple example using the SavedModel in TF 2.0 because I haven't seen one, even this simple, anywhere.

@Tom's answer is a SavedModel example, but it will not work on Tensorflow 2.0, because unfortunately there are some breaking changes.

@Vishnuvardhan Janapati's answer says TF 2.0, but it's not for SavedModel format.

Bersan
  • 1,032
  • 1
  • 17
  • 28
3

Use tf.train.Saver to save a model. Remember, you need to specify the var_list if you want to reduce the model size. The val_list can be:

  • tf.trainable_variables or
  • tf.global_variables.
Mario
  • 1,631
  • 2
  • 21
  • 51
Ariel
  • 211
  • 1
  • 3
  • 9
3

You can save the variables in the network using

saver = tf.train.Saver() 
saver.save(sess, 'path of save/fileName.ckpt')

To restore the network for reuse later or in another script, use:

saver = tf.train.Saver()
saver.restore(sess, tf.train.latest_checkpoint('path of save/')
sess.run(....) 

Important points:

  1. sess must be same between first and later runs (coherent structure).
  2. saver.restore needs the path of the folder of the saved files, not an individual file path.
sk29910
  • 2,326
  • 1
  • 18
  • 23
Ali Mahdavi
  • 132
  • 7
3

Following @Vishnuvardhan Janapati 's answer, here is another way to save and reload model with custom layer/metric/loss under TensorFlow 2.0.0

import tensorflow as tf
from tensorflow.keras.layers import Layer
from tensorflow.keras.utils.generic_utils import get_custom_objects

# custom loss (for example)  
def custom_loss(y_true,y_pred):
  return tf.reduce_mean(y_true - y_pred)
get_custom_objects().update({'custom_loss': custom_loss}) 

# custom loss (for example) 
class CustomLayer(Layer):
  def __init__(self, ...):
      ...
  # define custom layer and all necessary custom operations inside custom layer

get_custom_objects().update({'CustomLayer': CustomLayer})  

In this way, once you have executed such codes, and saved your model with tf.keras.models.save_model or model.save or ModelCheckpoint callback, you can re-load your model without the need of precise custom objects, as simple as

new_model = tf.keras.models.load_model("./model.h5"})
yiyang
  • 75
  • 7
2

Wherever you want to save the model,

self.saver = tf.train.Saver()
with tf.Session() as sess:
            sess.run(tf.global_variables_initializer())
            ...
            self.saver.save(sess, filename)

Make sure, all your tf.Variable have names, because you may want to restore them later using their names. And where you want to predict,

saver = tf.train.import_meta_graph(filename)
name = 'name given when you saved the file' 
with tf.Session() as sess:
      saver.restore(sess, name)
      print(sess.run('W1:0')) #example to retrieve by variable name

Make sure that saver runs inside the corresponding session. Remember that, if you use the tf.train.latest_checkpoint('./'), then only the latest check point will be used.

Akshaya Natarajan
  • 1,865
  • 15
  • 17
2

I'm on Version:

tensorflow (1.13.1)
tensorflow-gpu (1.13.1)

Simple way is

Save:

model.save("model.h5")

Restore:

model = tf.keras.models.load_model("model.h5")
007fred
  • 155
  • 2
  • 8
1

In the new version of tensorflow 2.0, the process of saving/loading a model is a lot easier. Because of the Implementation of the Keras API, a high-level API for TensorFlow.

To save a model: Check the documentation for reference: https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/keras/models/save_model

tf.keras.models.save_model(model_name, filepath, save_format)

To load a model:

https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/keras/models/load_model

model = tf.keras.models.load_model(filepath)
Vineet Suryan
  • 101
  • 1
  • 9
1

the easiest way is to use keras api, on line for saving the model and one line for loading model

from keras.models import load_model

my_model.save('my_model.h5')  # creates a HDF5 file 'my_model.h5'

del my_model  # deletes the existing model


my_model = load_model('my_model.h5') # returns a compiled model identical to the previous one
Ali karimi
  • 371
  • 3
  • 10
0

You can use the saver object in Tensorflow to save your trained model. This object provides methods to save and restore models.

To save a trained model in TensorFlow:

tf.train.Saver.save(sess, save_path, global_step=None, latest_filename=None,
                    meta_graph_suffix='meta', write_meta_graph=True,
                    write_state=True, strip_default_attrs=False,
                    save_debug_info=False)

To restore a saved model in TensorFlow:

tf.train.Saver.restore(sess, save_path, latest_filename=None,
                       meta_graph_suffix='meta', clear_devices=False,
                       import_scope=None)
Matias Molinas
  • 2,246
  • 15
  • 11