1

Running on Ubuntu 16.04, latest (1.1.0) tensorflow (installed via pip3 install tensorflow-gpu), CUDA8 + CUDNN5.

The code looks more or less like this:

import tensorflow as tf
from tensorflow.contrib.learn import KMeansClustering

trainencflt = #pandas frame with ~30k rows and ~300 columns
def train_input_fn():
    return (tf.constant(trainencflt, shape = [trainencflt.shape[0], trainencflt.shape[1]]), None)

configuration = tf.contrib.learn.RunConfig(log_device_placement=True)
model = KMeansClustering(num_clusters=k,
                         initial_clusters=KMeansClustering.RANDOM_INIT,
                         relative_tolerance=1e-8,
                         config=configuration)
model.fit(input_fn = train_input_fn, steps = 100)

When it runs I see:

2017-06-15 10:24:41.564890: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 0 with properties:
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.7335
pciBusID 0000:81:00.0
Total memory: 7.92GiB
Free memory: 7.81GiB
2017-06-15 10:24:41.564934: I tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0
2017-06-15 10:24:41.564942: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 0:   Y
2017-06-15 10:24:41.564956: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:81:00.0)

Memory gets allocated:

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    1       548    C   python                                        7745MiB |
+-----------------------------------------------------------------------------+

But then none of the operations are performed on the GPU (it stays at 0% all the time, the CPU utilization does skyrocket on all cores):

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.66                 Driver Version: 375.66                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   1  GeForce GTX 1080    Off  | 0000:02:00.0     Off |                  N/A |
| 29%   43C    P8    13W / 180W |   7747MiB /  8114MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

Not seeing any placement logs (even though I specified log_device_placement to be True).

I did try the simple GPU examples and they were working just fine (at least the placement logs were looking fine).

Am I missing something?

Mateusz Dymczyk
  • 14,969
  • 10
  • 59
  • 94
  • TensorFlow by default allocates all available memory. See: https://stackoverflow.com/questions/34199233/how-to-prevent-tensorflow-from-allocating-the-totality-of-a-gpu-memory It may not be running on GPU because this does not have GPU kernels written for it. – jkschin Jun 16 '17 at 09:34
  • @jkschin yes I'm aware of the memory allocation patterns - was just showing that as proof of using TF GPU. As for gpu kernels, I tried going through the code and the internals seem to be using a lot of basic TF operations, which I think should work on the GPU, but I'm not sure. – Mateusz Dymczyk Jun 16 '17 at 09:57
  • An experiment you can try would be to assert tf.device and no soft placement, and then see what errors are thrown? – jkschin Jun 16 '17 at 09:58

1 Answers1

0

Went through the codebase - TF 1.1.0 simply didn't have a GPU kernel.

Mateusz Dymczyk
  • 14,969
  • 10
  • 59
  • 94