1

I faced a problem with properly restoring the saved model in tensorflow. I created the Bidirectional RNN model in tensorflow with following code:

batchX_placeholder = tf.placeholder(tf.float32, [None, timesteps, 1],
                                    name="batchX_placeholder")])
batchY_placeholder = tf.placeholder(tf.float32, [None, num_classes],
                                    name="batchY_placeholder")
weights = tf.Variable(np.random.rand(2*STATE_SIZE, num_classes),
                      dtype=tf.float32, name="weights")
biases = tf.Variable(np.zeros((1, num_classes)), dtype=tf.float32,
                     name="biases")
logits = BiRNN(batchX_placeholder, weights, biases)
with tf.name_scope("prediction"):
    prediction = tf.nn.softmax(logits)
loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
logits=logits, labels=batchY_placeholder))
lr = tf.Variable(learning_rate, trainable=False, dtype=tf.float32,
                 name='lr')
optimizer = tf.train.AdamOptimizer(learning_rate=lr)
train_op = optimizer.minimize(loss_op)
init_op = tf.initialize_all_variables()
saver = tf.train.Saver()

The architecture of BiRNN created with the following function:

def BiRNN(x, weights, biases):
    # Unstack to get a list of 'time_steps' tensors of shape (batch_size,
    # num_input)
    x = tf.unstack(x, time_steps, 1)
    # Forward and Backward direction cells
    lstm_fw_cell = rnn.BasicLSTMCell(STATE_SIZE, forget_bias=1.0)
    lstm_bw_cell = rnn.BasicLSTMCell(STATE_SIZE, forget_bias=1.0)
    outputs, _, _ = rnn.static_bidirectional_rnn(lstm_fw_cell,
        lstm_bw_cell, x, dtype=tf.float32)
    # Linear activation, using rnn inner loop last output
    return tf.matmul(outputs[-1], weights) + biases

Then I train a model and save it after each 200 steps:

with tf.Session() as sess:
    sess.run(init_op)
    current_step = 0
    for batch_x, batch_y in get_minibatch():
        sess.run(train_op, feed_dict={batchX_placeholder: batch_x,
                                      batchY_placeholder: batch_y})
        current_step += 1
        if current_step % 200 == 0:
            saver.save(sess, os.path.join(model_dir, "model")

To run the saved model in inference mode I use saved tensorflow graph in "model.meta" file:

graph = tf.get_default_graph()
saver = tf.train.import_meta_graph(os.path.join(model_dir, "model.meta"))
sess = tf.Session()
saver.restore(sess, tf.train.latest_checkpoint(model_dir)
weights = graph.get_tensor_by_name("weights:0")
biases = graph.get_tensor_by_name("biases:0")
batchX_placeholder = graph.get_tensor_by_name("batchX_placeholder:0")
batchY_placeholder = graph.get_tensor_by_name("batchY_placeholder:0")
logits = BiRNN(batchX_placeholder, weights, biases)
prediction = graph.get_operation_by_name("prediction/Softmax")
argmax_pred = tf.argmax(prediction, 1)
init = tf.global_variables_initializer()
sess.run(init)
for x_seq, y_gt in get_sequence():
    _, y_pred = sess.run([prediction, argmax_pred],
                    feed_dict={batchX_placeholder: [x_seq]],
                               batchY_placeholder: [[0.0, 0.0]]})
    print("Y ground true: " + str(y_gt) + ", Y pred: " + str(y_pred[0]))

And when I run the code in inference mode, I get different results each time I launch it. It seems that output neurons from the softmax layer randomly bundled with different output classes.

So, my question is: How can I save and then correctly restore the model in tensorflow, so that all neurons properly bundled with corresponding output classes?

Nurtas
  • 43
  • 5

1 Answers1

2

There is no need to call tf.global_variables_initializer(), I think that is your problem.

I removed some operations: logits, weights and biases since you don't need them, all those are already loaded, use graph.get_tensor_by_name to get them.

For the prediction, get the tensor instead of the operation. (see this answer):

This is the code:

graph = tf.get_default_graph()
saver = tf.train.import_meta_graph(os.path.join(model_dir, "model.meta"))
sess = tf.Session()
saver.restore(sess, tf.train.latest_checkpoint(model_dir))

batchX_placeholder = graph.get_tensor_by_name("batchX_placeholder:0")
batchY_placeholder = graph.get_tensor_by_name("batchY_placeholder:0")
prediction = graph.get_tensor_by_name("prediction/Softmax:0")
argmax_pred = tf.argmax(prediction, 1)

Edit 1: I notice that I wasn't clear on why you got different results.

And when I run the code in inference mode, I get different results each time I launch it.

Notice that although you used the weights from the loaded model, you are creating the BiRNN again, and the BasicLSTMCell also have weights and other variables that you don't set from your loaded model, hence they need to be initialized (with new random values) resulting in an untrained model again.

Julio Daniel Reyes
  • 5,489
  • 1
  • 19
  • 23