5

I installed tensorflow with gpu, cuda 7.0 and cudnn 6.5. When I import tensorflow it works well.

I am trying to run a simple matrix multiplication on Tensorflow and it doesn't want to use my gpu though it seems to recognize it. I have this issue on my computer with a nvidia geforce 970m and on a cluster with two titan Z.

My first code is :

import tensorflow as tf
import numpy as np

size=100
#I create 2 matrix
mat1 = np.random.random_sample([size, size])*100
mat2 = np.random.random_sample([size, size])*100

a = tf.constant(mat1)
b = tf.constant(mat2)
c = tf.matmul(a, b)
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
sess.run(c)

This code works and the result is :

Const_1: /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:289] Const_1: /job:localhost/replica:0/task:0/gpu:0
Const: /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:289] Const: /job:localhost/replica:0/task:0/gpu:0
MatMul: /job:localhost/replica:0/task:0/cpu:0
I tensorflow/core/common_runtime/simple_placer.cc:289] MatMul: /job:localhost/replica:0/task:0/cpu:0

So in my way, tensorflow uses my gpu to create constant but not for matmul (that is weird). Then, I force the gpu like this :

with tf.device("/gpu:0"):
    a = tf.constant(mat1)
    b = tf.constant(mat2)
    c = tf.matmul(a, b)
    sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
    sess.run(c)

And Tensorflow returns :

InvalidArgumentError: Cannot assign a device to node 'MatMul': Could not satisfy explicit device specification '/gpu:0'

If someone have the same problem or an idea, I will be glad to read your answer !

  • Perhaps you are hitting this issue https://github.com/tensorflow/tensorflow/issues/29 "Tensorflow seems to require a cuda compute capability of 3.5," - Some of us have only 3.0. Check yours here: https://en.wikipedia.org/wiki/CUDA#Supported_GPUs – GavinBrelstaff Feb 16 '16 at 10:32
  • Thanks for your answer but I checked and the gtx 970m has a 5.2 compute capability and titan Z has a 3.5 compute capability. Moreover when execute the script ./configure.sh it tells me [Default is: "3.5,5.2"] so I think it is good on this side. – Maxence Queyrel Feb 16 '16 at 11:27
  • 1
    try mat1=np.random.random_sample([size, size]).astype(np.float32)*100 – Yaroslav Bulatov Feb 16 '16 at 22:02
  • Oh it works ! Thank you very much ! So now I know that float64 is not a good idea for matrix multiplication ;) – Maxence Queyrel Feb 17 '16 at 08:57
  • well it depends what you need to do. If you need to minimize a loss function to "arbitrary" precision then you need doubles – stefano Feb 17 '16 at 10:02

1 Answers1

3

I do not have enough reputation to comment, I have come across a similar issue, my question is here

TensorFlow: critical graph operations assigned to cpu rather than gpu

stefano
  • 359
  • 1
  • 3
  • 13