I am looking for a simple way of verifying that my TF
graphs are actually running on the GPU.
PS. It would also be nice to verify that the cuDNN
library is used.
I am looking for a simple way of verifying that my TF
graphs are actually running on the GPU.
PS. It would also be nice to verify that the cuDNN
library is used.
There are several ways to view op placement.
Add RunOptions and RunMetadata to the session call and view the placement of ops and computations in Tensorboard. See code here: https://www.tensorflow.org/get_started/graph_viz
Specify the log_device_placement option in a session ConfigProto. This logs to console which device the operations are placed on. https://www.tensorflow.org/api_docs/python/tf/ConfigProto
View GPU usage in the terminal using nvidia-smi.
When you import TF in Python
import tensorflow as tf
You will get these logs which indicate usage of CUDA libraries
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
Also, when you build a graph and run a session with log_device_placement in Config Proto, you will get these logs (shows it found a GPU device):
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 1060 6GB
major: 6 minor: 1 memoryClockRate (GHz) 1.759
pciBusID 0000:01:00.0
Total memory: 5.93GiB
Free memory: 4.94GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0)
There is a related TensorFlow upstream issue. Basically it says that the Python API does not expose such information yet.
The C++ API however does. E.g. there is tensorflow::KernelsRegisteredForOp()
. I wrote a small Python wrapper around that and then implemented supported_devices_for_op
here (in this commit).