I try to monitor the usage of my tensorflow models with timeline. This link explains how to use it: https://stackoverflow.com/a/37774470/6716760. The minimal example here is:
import tensorflow as tf
from tensorflow.python.client import timeline
x = tf.random_normal([1000, 1000])
y = tf.random_normal([1000, 1000])
res = tf.matmul(x, y)
# Run the graph with full trace option
with tf.Session() as sess:
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
run_metadata = tf.RunMetadata()
sess.run(res, options=run_options, run_metadata=run_metadata)
# Create the Timeline object, and write it to a json
tl = timeline.Timeline(run_metadata.step_stats)
ctf = tl.generate_chrome_trace_format()
with open('timeline.json', 'w') as f:
f.write(ctf)
Unfortunately I get the following error when I try to execute the script:
An error ocurred while starting the kernel
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: TITAN X (Pascal)
major: 6 minor: 1 memoryClockRate (GHz) 1.531
pciBusID 0000:0a:00.0
Total memory: 11.90GiB
Free memory: 11.61GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x28f93b0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 1 with properties:
name: TITAN X (Pascal)
major: 6 minor: 1 memoryClockRate (GHz) 1.531
pciBusID 0000:09:00.0
Total memory: 11.90GiB
Free memory: 11.75GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x2c976b0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 2 with properties:
name: TITAN X (Pascal)
major: 6 minor: 1 memoryClockRate (GHz) 1.531
pciBusID 0000:06:00.0
Total memory: 11.90GiB
Free memory: 11.75GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x2ba5d80
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 3 with properties:
name: TITAN X (Pascal)
major: 6 minor: 1 memoryClockRate (GHz) 1.531
pciBusID 0000:05:00.0
Total memory: 11.89GiB
Free memory: 11.52GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 1 2 3
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y Y Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 1: Y Y Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 2: Y Y Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 3: Y Y Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) ‑> (device: 0, name: TITAN X (Pascal), pci bus id: 0000:0a:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:1) ‑> (device: 1, name: TITAN X (Pascal), pci bus id: 0000:09:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:2) ‑> (device: 2, name: TITAN X (Pascal), pci bus id: 0000:06:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:3) ‑> (device: 3, name: TITAN X (Pascal), pci bus id: 0000:05:00.0)
I tensorflow/stream_executor/dso_loader.cc:126] Couldn't open CUDA library libcupti.so.8.0. LD_LIBRARY_PATH: /usr/local/cuda/lib64
F tensorflow/core/platform/default/gpu/cupti_wrapper.cc:59] Check failed: ::tensorflow::Status::OK() == (::tensorflow::Env::Default()‑>GetSymbolFromLibrary( GetDsoHandle(), kName, &f)) (OK vs. Not found: /home/sysgen/anaconda3/lib/python3.5/site‑packages/tensorflow/python/_pywrap_tensorflow.so: undefined symbol: cuptiActivityRegisterCallbacks)could not find cuptiActivityRegisterCallbacksin libcupti DSO
The error is hidden in the last line at the end. But what does this mean? How can I fix it?