0

The program i am using i copy-pasted from here with a few changes. This is my code with an attempt to boost the speed of training:

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

import tensorflow as tf

x = tf.placeholder(tf.float32, shape=[None, 784])
y_ = tf.placeholder(tf.float32, shape=[None, 10])
W = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))

def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                        strides=[1, 2, 2, 1], padding='SAME')

with tf.device('/gpu:0'):
  W_conv1 = weight_variable([5, 5, 1, 32])
  b_conv1 = bias_variable([32])
  x_image = tf.reshape(x, [-1, 28, 28, 1])
  h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
  h_pool1 = max_pool_2x2(h_conv1)

  W_conv2 = weight_variable([5, 5, 32, 64])
  b_conv2 = bias_variable([64])

  h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
  h_pool2 = max_pool_2x2(h_conv2)

  W_fc1 = weight_variable([7 * 7 * 64, 1024])
  b_fc1 = bias_variable([1024])

  h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
  h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

  keep_prob = tf.placeholder(tf.float32)
  h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

  W_fc2 = weight_variable([1024, 10])
  b_fc2 = bias_variable([10])

  y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2

  cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))
  train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
  correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
  accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

  with tf.Session(config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=True)) as sess:
    sess.run(tf.global_variables_initializer())
    for i in range(20000):
      batch = mnist.train.next_batch(50)
      if i % 100 == 0:
        train_accuracy = accuracy.eval(feed_dict={
          x: batch[0], y_: batch[1], keep_prob: 1.0})
        print('step %d, training accuracy %g' % (i, train_accuracy))
      train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

  print('test accuracy %g' % accuracy.eval(feed_dict={
      x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))

Which produces the following output:

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
step 0, training accuracy 0.22
step 100, training accuracy 0.76
step 200, training accuracy 0.88
...

The problem is that the time taken by the original code on the tutorial (ie without the with tf.device('/gpu:0'): on line 26) and this code have no measurable difference (about 10 seconds for each step). I have installed cuda-8.0 and cuDNN successfully (after many hours of failed attempts). "$ nvidia-smi" returns the following output:

Sun Jul  2 13:57:10 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.26                 Driver Version: 375.26                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GT 710      Off  | 0000:01:00.0     N/A |                  N/A |
| N/A   49C    P0    N/A /  N/A |    406MiB /  2000MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+


+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0                  Not Supported                                         |
+-----------------------------------------------------------------------------+

So the question is:

1) Is the work too small to produce no difference in choosing a cpu or a gpu? 2) Or there some stupid mistake in my implementation?

Thanks for reading the whole question.

Roofi
  • 82
  • 1
  • 3
  • 15
  • 1
    It simply means GPU is used by default when available. You should rather explicitly use the CPU to measure the difference. – P-Gn Jul 02 '17 at 08:57
  • Thanks @user1735003. I tried what you suggested (replaced gpu with cpu). The result was that each step was 5 seconds longer. It should be faster, right? Also, when i copy-pasted the original code from the website and compared it with the above mentioned code, there was no observable difference. Can you tell me why? – Roofi Jul 02 '17 at 09:33

1 Answers1

1

The fact that you can run this code with no errors suggest that TensorFlow can definitely run with a GPU. The problem here is that when you run TensorFlow as is, by default, it tries to run on the GPU. There are a few ways you can force it to run on the CPU.

  1. Run it this way: CUDA_VISIBLE_DEVICES= python code.py. Note that when you do this and still have with tf.device('/gpu:0'), it will break, so remove it.
  2. Change with tf.device('/gpu:0') to with tf.device('/cpu:0')

EDIT from question in comments

See here for more information on what allow_soft_placement and log_device_placement mean in ConfigProto.

jkschin
  • 5,776
  • 6
  • 35
  • 62
  • Sorry for being unclear @jkschin but does the statement "TensorFlow as is, by default, it tries to run on the GPU" hold true even when i do not mention `config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=True)` inside the Session's parentheses. – Roofi Jul 02 '17 at 17:04
  • These parameters do not affect whether it's run on a GPU or not. See [here](https://stackoverflow.com/questions/44873273/what-do-the-options-in-configproto-like-allow-soft-placement-and-log-device-plac/44873274#44873274) for more information. – jkschin Jul 02 '17 at 17:20
  • please add your last comment in the answer (for future googlers) @jkschin – Roofi Jul 04 '17 at 09:07