Saving trained Tensorflow model to inference on another machine

Question

I'm relatively new to machine learning and the Tensorflow framework. I was trying to take my trained model heavily influenced by the code presented here, using the MNIST handwritten digit dataset and perform inferences on testing examples that I have created. However, I am doing the training on a remote machine with a GPU and am trying to save the data to a directory so that I can transfer the data and inference on a local machine

It seems that I was able to save some of the model with tf.saved_model.simple_save, however, I'm unsure of how to use the saved data to do inferencing and to use the data to make a prediction given a new image. It seems like there are multiple ways to save a model, but I am unsure of what the convention or of what the "correct way" is to do it with the Tensorflow framwork.

So far, this is the line that I think I would need, but am unsure if it is correct.

            tf.saved_model.simple_save(sess, 'mnist_model',                                                                                 
                inputs={'x': self.x},                                                                                                   
                outputs={'y_': self.y_, 'y_conv':self.y_conv})

If someone could point me in the direction of how to properly save trained models and which variables to use to be able to inference using the saved model, I'd really appreciate it.

score 2 · Accepted Answer · answered Aug 03 '18 at 20:10

A way you could do this is to create a tf.train.Saver() object in your graph definition, then use that to save the network to a specified directory. The weights in that directory can then be downloaded from the remote machine to your local one and restored locally. Here is a small example network:

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets('MNIST_data', one_hot=True)


# >>>> Config. Vars <<<<

TRAIN_STEPS = 1000

SAVE_EVERY  = 100


# >>>> Network <<<<

inputs = tf.placeholder(tf.float32, shape=[None, 784])

labels = tf.placeholder(tf.float32, shape=[None, 10])

h1     = tf.layers.dense(inputs, 256, activation=tf.nn.relu, use_bias=True)

logits = tf.layers.dense(h1, 10, use_bias=True)

predictions = tf.nn.softmax(logits)

prediction_ids = tf.argmax(predictions, axis=1)

# >>>> Loss & Optimisation <<<<

loss = tf.nn.softmax_cross_entropy_with_logits_v2(labels=labels, logits=logits)

opt  = tf.train.AdamOptimizer().minimize(loss)

# >>>> Utilities <<<<

init  = tf.global_variables_initializer()

saver = tf.train.Saver()


with tf.Session() as sess:

    sess.run(init)

    # >>>> Training - run on remote, comment out locally <<<<

    for i in range(TRAIN_STEPS):

        print("Train step {}".format(i), end="\r")

        batch_data, batch_labels = mnist.train.next_batch(batch_size=128)

        feed_dict = {
            inputs: batch_data,
            labels: batch_labels
        }

        l, _ = sess.run([loss, opt], feed_dict=feed_dict)

        if i % SAVE_EVERY == 0:

            saver.save(sess, "saved_model/network_weights.ckpt")


    # >>>> Using the network - run locally to use the network <<<

    saver.restore(sess, "saved_model/network_weights.ckpt")

    test_data, test_labels = mnist.test.images, mnist.test.labels

    feed_dict = {
        inputs: test_data,
        labels: test_labels
    }

    preds = sess.run(prediction_ids, feed_dict=feed_dict)

    print(preds)

So once you define the saver in the network, you can use it to save the weights to the specified directory - in this case in the directory "saved_models", which you'll need to have created before you run this particular code.

Restoring the model is as simple as calling saver.restore() then and passing it the session and the path to where your weights are stored. So you can run this code on your remote machine, download the "saved_models" directory to your local machine then run this code with the training part commented out to actually use the model.

Oh I see, I guess I didn't realize that you need to reconstruct the model and redefine variables when restoring the graph. Thank you! — Alexander Gharibian, Aug 07 '18 at 15:26

Saving trained Tensorflow model to inference on another machine

1 Answers1