8

I am looking for a simple way of verifying that my TF graphs are actually running on the GPU.

PS. It would also be nice to verify that the cuDNN library is used.

talonmies
  • 70,661
  • 34
  • 192
  • 269
Toke Faurby
  • 5,788
  • 9
  • 41
  • 62
  • Run with `nvprof` can give detailed information about cuda function calls. Or just run `nvidia-smi` to check for GPU utilization while running. – Kh40tiK Mar 01 '17 at 13:10
  • Possible duplicate of [How does one have TensorFlow not run the script unless the GPU was loaded successfully?](http://stackoverflow.com/questions/42403501/how-does-one-have-tensorflow-not-run-the-script-unless-the-gpu-was-loaded-succes) – Hugh Perkins May 01 '17 at 23:55

3 Answers3

9

There are several ways to view op placement.

  1. Add RunOptions and RunMetadata to the session call and view the placement of ops and computations in Tensorboard. See code here: https://www.tensorflow.org/get_started/graph_viz

  2. Specify the log_device_placement option in a session ConfigProto. This logs to console which device the operations are placed on. https://www.tensorflow.org/api_docs/python/tf/ConfigProto

  3. View GPU usage in the terminal using nvidia-smi.

pfredriksen
  • 223
  • 2
  • 5
5

When you import TF in Python

import tensorflow as tf

You will get these logs which indicate usage of CUDA libraries

I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally

Also, when you build a graph and run a session with log_device_placement in Config Proto, you will get these logs (shows it found a GPU device):

I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GeForce GTX 1060 6GB
major: 6 minor: 1 memoryClockRate (GHz) 1.759
pciBusID 0000:01:00.0
Total memory: 5.93GiB
Free memory: 4.94GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0)
Harsha Pokkalla
  • 1,792
  • 1
  • 12
  • 17
  • I m getting the same output as described above. But the amount models is taking to being trained is same as CPU. Not sure, what I m doing wrong. Using DNNRegressor Estimator, and same code is being run on CPU and GPU without any modification, as I learned that Estimtors by default pick GPU for execution if GPU is available. Any way out to figure?? – user3457384 Oct 09 '17 at 13:30
  • could you see GPU usage? do "watch nvidia-smi". Look at memory usage and GPU volatility. – Harsha Pokkalla Oct 09 '17 at 15:04
  • Here is my problem explained in detail. https://stackoverflow.com/questions/46648484/how-to-make-best-use-of-gpu-for-tensorflow-estimators – user3457384 Oct 09 '17 at 16:51
0

There is a related TensorFlow upstream issue. Basically it says that the Python API does not expose such information yet.

The C++ API however does. E.g. there is tensorflow::KernelsRegisteredForOp(). I wrote a small Python wrapper around that and then implemented supported_devices_for_op here (in this commit).

Albert
  • 65,406
  • 61
  • 242
  • 386