Tensorflow - Saving and restoring a model

Question

I have came across this question in Stackoverflow that shows how one can save and restore a model.

My question is how can I do that within my code below, as I'm not sure how to integrate it with my code:

import numpy as np
import matplotlib.pyplot as plt
import cifar_tools
import tensorflow as tf

data, labels = cifar_tools.read_data('C:\\Users\\abc\\Desktop\\Testing')

x = tf.placeholder(tf.float32, [None, 150 * 150])
y = tf.placeholder(tf.float32, [None, 2])

w1 = tf.Variable(tf.random_normal([5, 5, 1, 64]))
b1 = tf.Variable(tf.random_normal([64]))

w2 = tf.Variable(tf.random_normal([5, 5, 64, 64]))
b2 = tf.Variable(tf.random_normal([64]))

w3 = tf.Variable(tf.random_normal([38*38*64, 1024]))
b3 = tf.Variable(tf.random_normal([1024]))

w_out = tf.Variable(tf.random_normal([1024, 2]))
b_out = tf.Variable(tf.random_normal([2]))

def conv_layer(x,w,b):
    conv = tf.nn.conv2d(x,w,strides=[1,1,1,1], padding = 'SAME')
    conv_with_b = tf.nn.bias_add(conv,b)
    conv_out = tf.nn.relu(conv_with_b)
    return conv_out

def maxpool_layer(conv,k=2):
    return tf.nn.max_pool(conv, ksize=[1,k,k,1], strides=[1,k,k,1], padding='SAME')

def model():
    x_reshaped = tf.reshape(x, shape=[-1, 150, 150, 1])

    conv_out1 = conv_layer(x_reshaped, w1, b1)
    maxpool_out1 = maxpool_layer(conv_out1)
    norm1 = tf.nn.lrn(maxpool_out1, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75)
    conv_out2 = conv_layer(norm1, w2, b2)
    norm2 = tf.nn.lrn(conv_out2, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75)
    maxpool_out2 = maxpool_layer(norm2)

    maxpool_reshaped = tf.reshape(maxpool_out2, [-1, w3.get_shape().as_list()[0]])
    local = tf.add(tf.matmul(maxpool_reshaped, w3), b3)
    local_out = tf.nn.relu(local)

    out = tf.add(tf.matmul(local_out, w_out), b_out)
    return out

model_op = model()

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(model_op, y))
train_op = tf.train.AdamOptimizer(learning_rate=0.001).minimize(cost)

correct_pred = tf.equal(tf.argmax(model_op, 1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred,tf.float32))

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    onehot_labels = tf.one_hot(labels, 2, on_value=1.,off_value=0.,axis=-1)
    onehot_vals = sess.run(onehot_labels)
    batch_size = 1
    for j in range(0, 5):
        print('EPOCH', j)
        for i in range(0, len(data), batch_size):
            batch_data = data[i:i+batch_size, :]
            batch_onehot_vals = onehot_vals[i:i+batch_size, :]
            _, accuracy_val = sess.run([train_op, accuracy], feed_dict={x: batch_data, y: batch_onehot_vals})
            print(i, accuracy_val)

        print('DONE WITH EPOCH')

Thanks.

score 0 · Answer 1 · answered Mar 27 '17 at 04:43

Here is some sample code I have used in the past for restoring. This should be done after the session creation, but before running the model.

saver = tf.train.Saver()

ckpt = tf.train.get_checkpoint_state(FLAGS.checkpoint_dir)
if ckpt and ckpt.model_checkpoint_path:
    saver.restore(sess, ckpt.model_checkpoint_path)
    print(ckpt.model_checkpoint_path)
    i_stopped = int(ckpt.model_checkpoint_path.split('/')[-1].split('-')[-1])
else:
    print('No checkpoint file found!')
    i_stopped = 0

And for saving, every 1000 batches, or in your case you could save every epoch:

if i % 1000 == 0:
    checkpoint_path = os.path.join(FLAGS.checkpoint_dir, 'model.ckpt')
    saver.save(sess, checkpoint_path, global_step=i)

It should be fairly straightforward implementing this into your code. Remember you must define the checkpoint directory where the model will be saved.

Hope this helps!

Thanks for your kind reply. When trying to run the code, I'm getting: Traceback (most recent call last): File "cnn.py", line 63, in ckpt = tf.train.get_checkpoint_state(FLAGS.checkpoint_dir) NameError: name 'FLAGS' is not defined — Simplicity, Mar 27 '17 at 12:48
Yes, the `FLAGS.checkpoint_dir` can be anything you'd like. I suggest explicitly defining your path, say: `ckpt_path = /path/to/ckpts/` where you would like your checkpoints to be stored and use `ckpt_path` in place of `FLAGS.checkpoint_dir` . Apart from that, everything else should be fine — The Brofessor, Mar 27 '17 at 15:37

Tensorflow - Saving and restoring a model

1 Answers1