Tensorflow: Run training phase on GPU and test phase on CPU

Question

I wish to run the training phase of my tensorflow code on my GPU while after I finish and store the results to load the model I created and run its test phase on CPU.

I have created this code (I have put a part of it, just for reference because it's huge otherwise, I know that the rules are to include a fully functional code and I apologise about that).

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
from tensorflow.contrib.rnn.python.ops import rnn_cell, rnn

# Import MNIST data http://yann.lecun.com/exdb/mnist/
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
x_train = mnist.train.images 
# Check that the dataset contains 55,000 rows and 784 columns
N,D = x_train.shape

tf.reset_default_graph()
sess = tf.InteractiveSession()

x = tf.placeholder("float", [None, n_steps,n_input]) 
y_true = tf.placeholder("float", [None, n_classes]) 
keep_prob = tf.placeholder(tf.float32,shape=[])
learning_rate = tf.placeholder(tf.float32,shape=[]) 

#[............Build the RNN graph model.............]

sess.run(tf.global_variables_initializer())
# Because I am using my GPU for the training, I avoid allocating the whole 
# mnist.validation set because of memory error, so I gragment it to 
# small batches (100)
x_validation_bin, y_validation_bin = mnist.validation.next_batch(batch_size)
x_validation_bin = binarize(x_validation_bin, threshold=0.1)
x_validation_bin = x_validation_bin.reshape((-1,n_steps,n_input))

for k in range(epochs):

    steps = 0

    for i in range(training_iters):
        #Stochastic descent
        batch_x, batch_y = mnist.train.next_batch(batch_size)
        batch_x = binarize(batch_x, threshold=0.1)
        batch_x = batch_x.reshape((-1,n_steps,n_input))
        sess.run(train_step, feed_dict={x: batch_x, y_true: batch_y,keep_prob: keep_prob,eta:learning_rate})

        if do_report_err == 1:
            if steps % display_step == 0:
                # Calculate batch accuracy
                acc = sess.run(accuracy, feed_dict={x: batch_x, y_true: batch_y,keep_prob: 1.0})
                # Calculate batch loss
                loss = sess.run(total_loss, feed_dict={x: batch_x, y_true: batch_y,keep_prob: 1.0})
                print("Iter " + str(i) + ", Minibatch Loss= " + "{:.6f}".format(loss) + ", Training Accuracy = " + "{:.5f}".format(acc))
        steps += 1




    # Validation Accuracy and Cost
    validation_accuracy = sess.run(accuracy,feed_dict={x:x_validation_bin, y_true:y_validation_bin, keep_prob:1.0})
    validation_cost = sess.run(total_loss,feed_dict={x:x_validation_bin, y_true:y_validation_bin, keep_prob:1.0})

    validation_loss_array.append(final_validation_cost)
    validation_accuracy_array.append(final_validation_accuracy)
    saver.save(sess, savefilename)
    total_epochs = total_epochs + 1

    np.savez(datasavefilename,epochs_saved = total_epochs,learning_rate_saved = learning_rate,keep_prob_saved = best_keep_prob, validation_loss_array_saved = validation_loss_array,validation_accuracy_array_saved = validation_accuracy_array,modelsavefilename = savefilename)

After that, my model has been trained successfully and saved the relevant data, so I wish to load the file and do a final train and test part in the model but using my CPU this time. The reason is the GPU can't handle the whole dataset of mnist.train.images and mnist.train.labels.

So, manually I select this part and I run it:

with tf.device('/cpu:0'):
# Initialise variables
    sess.run(tf.global_variables_initializer())

    # Accuracy and Cost
    saver.restore(sess, savefilename)
    x_train_bin = binarize(mnist.train.images, threshold=0.1)
    x_train_bin = x_train_bin.reshape((-1,n_steps,n_input))
    final_train_accuracy = sess.run(accuracy,feed_dict={x:x_train_bin, y_true:mnist.train.labels, keep_prob:1.0})
    final_train_cost = sess.run(total_loss,feed_dict={x:x_train_bin, y_true:mnist.train.labels, keep_prob:1.0})

    x_test_bin = binarize(mnist.test.images, threshold=0.1)
    x_test_bin = x_test_bin.reshape((-1,n_steps,n_input))
    final_test_accuracy = sess.run(accuracy,feed_dict={x:x_test_bin, y_true:mnist.test.labels, keep_prob:1.0})
    final_test_cost = sess.run(total_loss,feed_dict={x:x_test_bin, y_true:mnist.test.labels, keep_prob:1.0})

But I get an OMM GPU memory error, which it doesn't make sense to me since I think I have forced the program to rely on CPU. I did not put a command sess.close() in the first (training with batches) code, but I am not sure if this really the reason behind it. I followed this post actually for the CPU Any suggestions how to run the last part on CPU only?

score 4 · Accepted Answer · answered Mar 07 '17 at 22:35

4

with tf.device() statements only apply to graph building, not to execution, so doing sess.run inside a device block is equivalent to not having the device at all.

To do what you want to do you need to build separate training and test graphs, which share variables.

answered Mar 07 '17 at 22:35

Alexandre Passos

5,186
1
14
19

Thanks, very interesting. Do you mean building a training graph by `with tf.device('gpu:0')` and `'gpu:1'` for evaluation, then `tf.Session()` should be outside? Is that still valid in TF2? I encountered some batch_norm/dropout issue during inference – Zézouille Nov 28 '19 at 10:33

Tensorflow: Run training phase on GPU and test phase on CPU

1 Answers1