1

I'm trying to restore a model in Tensorflow which I've trained. The problem is that it does not seem like the weights are properly restored.

For the training I've got the weights and biases defined as:

W = {
   'h1': tf.Variable(tf.random_normal([n_inputs, n_hidden_1]), name='wh1'),
   'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2]), name='wh2'),
   'o': tf.Variable(tf.random_normal([n_hidden_2, n_classes]), name='wo')
}
b = {
   'b1': tf.Variable(tf.random_normal([n_hidden_1]), name='bh1'),
   'b2': tf.Variable(tf.random_normal([n_hidden_2]), name='bh2'),
   'o': tf.Variable(tf.random_normal([n_classes]), name='bo')
}

Then I do some training on my own custom 2D image dataset and save the model by calling the tf.saver

saver = tf.train.Saver()
saver.save(sess, 'tf.model')

Later I want to restore that model with the exact same weights, so I build the model as before (also with the random_normal initialization) and call the tf.saver.restore

saver = tf.train.import_meta_graph('tf.model.meta')
saver.restore(sess, tf.train.latest_checkpoint('./'))

Now, if i call:

temp = sess.run(W['h1'][0][0])
print temp

I get random values, and not the restored value of the weight.

I've drawn a blank on this one, can somebody point me in the right direction?

FYI, I've tried (without) luck to simply declare the tf.Variables, but I keep getting:

ValueError: initial_value must be specified.

even though Tensorflow themselves state that it should be possible to simply declare with no initial value (https://www.tensorflow.org/programmers_guide/variables part: Restoring Values)

Update 1

When I, as suggested, run

all_vars = tf.global_variables()
for v in all_vars:
   print v.name

I get the following output:

wh1:0
wh2:0
wo:0
bh1:0
bh2:0
bo:0
wh1:0
wh2:0
wo:0
bh1:0
bh2:0
bo:0
beta1_power:0
beta2_power:0
wh1/Adam:0
wh1/Adam_1:0
wh2/Adam:0
wh2/Adam_1:0
wo/Adam:0
wo/Adam_1:0
bh1/Adam:0
bh1/Adam_1:0
bh2/Adam:0
bh2/Adam_1:0
bo/Adam:0
bo/Adam_1:0

Which shows that the variables indeed is read. However invoking

print sess.run("wh1:0")

Results in the error: Attempting to use uninitialized value wh1

  • 1
    Did you try running `sess,run( "wh1:0" )` which should be the name of the weight variable in the graph? Another suggestion to debug this is to print all the restored variable names `all_vars = tf.global_variables()` `for v in all_vars:` `print v.name` – Kochoba Mar 06 '17 at 22:09
  • Moreover don't initialize global variables when you use restore. – Kochoba Mar 06 '17 at 22:15
  • @Kochoba: Please see my update of the original question. I've added the output of what you suggested me to run. I however, get an error from `print sess.run("wh1:0")` as it is not initialized. How can I solve that? – Nicolai Anton Lynnerup Mar 07 '17 at 05:54
  • The code I am using to restore the model is as follows although that I've seen a lot of people suggesting the method you are using. When I save the model I use: `saver.save(sess,"model.ckpt",global_step=global_step)` The global_step variable is optional. Then when I restore the model I just use: `saver.restore(sess,"model.ckpt-")` I know that `saver = tf.train.import_meta_graph('tf.model.meta')` restores an uninitialized graph with all the variables names but `tf.train.latest_checkpoint('./')` might fail to fetch the stored values. – Kochoba Mar 07 '17 at 19:28

4 Answers4

3

So with the help of you guys, I ended up dividing the saving and restoring parts of my program into two files, to ensure that no unwanted variables were initialized.

Train and Save routines fnn.py

def build(self, topology):
    """
    Builds the topology of the model
    """

    # Sanity check
    assert len(topology) == 4

    n_inputs = topology[0]
    n_hidden_1 = topology[1]
    n_hidden_2 = topology[2]
    n_classes = topology[3]

    # Sanity check
    assert self.img_h * self.img_w == n_inputs

    # Instantiate TF Placeholders
    self.x = tf.placeholder(tf.float32, [None, n_inputs], name='x')
    self.y = tf.placeholder(tf.float32, [None, n_classes], name='y')
    self.W = {
        'h1': tf.Variable(tf.random_normal([n_inputs, n_hidden_1]), name='wh1'),
        'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2]), name='wh2'),
        'o': tf.Variable(tf.random_normal([n_hidden_2, n_classes]), name='wo')
    }
    self.b = {
        'b1': tf.Variable(tf.random_normal([n_hidden_1]), name='bh1'),
        'b2': tf.Variable(tf.random_normal([n_hidden_2]), name='bh2'),
        'o': tf.Variable(tf.random_normal([n_classes]), name='bo')
    }

    # Create model
    self.l1 = tf.nn.sigmoid(tf.add(tf.matmul(self.x, self.W['h1']), self.b['b1']))
    self.l2 = tf.nn.sigmoid(tf.add(tf.matmul(self.l1, self.W['h2']), self.b['b2']))
    logits = tf.add(tf.matmul(self.l2, self.W['o']), self.b['o'])

    # Define predict operation
    self.predict_op = tf.argmax(logits, 1)
    probs = tf.nn.softmax(logits, name='probs')

    # Define cost function
    self.cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, self.y))

    # Adding these to collection so we can restore them again
    tf.add_to_collection('inputs', self.x)
    tf.add_to_collection('inputs', self.y)
    tf.add_to_collection('outputs', logits)
    tf.add_to_collection('outputs', probs)
    tf.add_to_collection('outputs', self.predict_op)

def train(self, X, Y, n_epochs=10, learning_rate=0.001, logs_path=None):
    """
    Trains the Model
    """
    self.optimizer = tf.train.AdamOptimizer(learning_rate).minimize(self.cost)

    costs = []

    # Instantiate TF Saver
    saver = tf.train.Saver()

    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        sess.run(tf.local_variables_initializer())

        # start the threads used for reading files
        coord = tf.train.Coordinator()
        threads = tf.train.start_queue_runners(sess=sess, coord=coord)

        # Compute total number of batches
        total_batch = int(self.get_num_examples() / self.batch_size)

        # start training
        for epoch in range(n_epochs):
            for i in range(total_batch):

                batch_xs, batch_ys = sess.run([X, Y])

                # run the training step with feed of images
                _, cost = sess.run([self.optimizer, self.cost], feed_dict={self.x: batch_xs,
                                                                           self.y: batch_ys})
                costs.append(cost)
                print "step %d" % (epoch * total_batch + i)
            #costs.append(cost)
            print "Epoch %d" % epoch

        saver.save(sess, self.model_file)

        temp = sess.run(self.W['h1'][0][0])
        print temp

        if self.visu:
            plt.plot(costs)
            plt.show()

        # finalize
        coord.request_stop()
        coord.join(threads)

Predict routine fnn_eval.py:

with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        sess.run(tf.local_variables_initializer())

        g = tf.get_default_graph()

        # restore the model
        self.saver = tf.train.import_meta_graph(self.model_file)
        self.saver.restore(sess, tf.train.latest_checkpoint('./tfmodels/fnn/'))

        wh1 = g.get_tensor_by_name("wh1:0")
        print sess.run(wh1[0][0])

        x, y = tf.get_collection('inputs')
        logits, probs, predict_op = tf.get_collection('outputs')

        coord = tf.train.Coordinator()
        threads = tf.train.start_queue_runners(sess=sess, coord=coord)

        predictions = []

        print Y.eval()

        for i in range(1):#range(self.get_num_examples()):
            batch_xs = sess.run(X)
            # Reshape batch_xs if only a single image is given
            #   (numpy is 4D: batch_size * heigth * width * channels)
            batch_xs = np.reshape(batch_xs, (-1, self.img_w * self.img_h))
            prediction, probabilities, logit = sess.run([predict_op, probs, logits], feed_dict={x: batch_xs})
            predictions.append(prediction[0])

        # finalize
        coord.request_stop()
        coord.join(threads)
0

I guess the problem might be caused by creating a new variable when you restore the model, not getting the already existed variable. I tried this code

saver = tf.train.import_meta_graph('./model.ckpt-10.meta')
w1 = None
for v in tf.global_variables():
        print v.name

w1 = tf.get_variable('wh1', [])

init = tf.global_variables_initializer()
sess.run(init)

saver.restore(sess, './model.ckpt-10')

for v in tf.global_variables():
    print v.name

and clearly you can see the output that it creates a new variable called wh1_1:0.

If you try this

w1 = None

for v in tf.global_variables():
    print v.name
    if v.name == 'wh1:0':
        w1 = v

init = [tf.global_variables_initializer(), tf.local_variables_initializer()]
sess.run(init)

saver.restore(sess, './model.ckpt-10')

for v in tf.global_variables():
    print v.name

temp = sess.run(w1)
print temp[0][0]

There will be no problem.

Tensorflow suggests that it is better to use tf.variable_scope() (link) like this

with tf.variable_scope("foo"):
    v = tf.get_variable("v", [1])
with tf.variable_scope("foo", reuse=True):
    v1 = tf.get_variable("v", [1])
assert v1 == v
LI Xuhong
  • 2,339
  • 2
  • 17
  • 32
0

I have meet the same problem when saving model to saved_model format. Anyone using the function add_meta_graph_and_variables to save the model for serving, be careful about this parameter "legacy_init_op: Legacy support for op or group of ops to execute after the restore op upon a load."

bidai
  • 211
  • 3
  • 5
-2

You want to pass in a var_list to the Saver.

In your case, the variable list would come from your W and b dictionaries: var_list = list(W.values())+list(b.values()). Then, to restore the model, pass in var_list to the Saver: saver = tf.train.Saver(var_list=var_list).

Next, you need to get your checkpoint state: model = tf.train.get_checkpoint_state(<your saved model directory>). After that you can restore the trained weights.

var_list = list(W.values())+list(b.values())
saver = tf.train.Saver(var_list=var_list)
model = tf.get_checkpoint_state('./model/')

with tf.Session() as sess:
    saver.restore(sess,model.model_checkpoint_path)
    #Now use the pretrained weights
user3813674
  • 2,553
  • 2
  • 15
  • 26
  • Would you care to elaborate on why I should pass the variables to the Saver constructor? From tf: "If you do not pass any argument to tf.train.Saver() the saver handles all variables in the graph. Each one of them is saved under the name that was passed when the variable was created." Source: https://www.tensorflow.org/programmers_guide/variables – Nicolai Anton Lynnerup Mar 07 '17 at 05:16