My tensorflow seems to be using gpu, but not showing the usual message "successfully opened CUDA library"

Question

I am running tensorflow on a google cloud instance with a single gpu. I believe it utilizes gpu, based on the following message:

2017-04-27 06:24:23.173402: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-04-27 06:24:23.173558: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-04-27 06:24:23.173607: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-04-27 06:24:23.173646: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-04-27 06:24:23.173700: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-04-27 06:24:23.341713: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-04-27 06:24:23.342735: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 0 with properties: 
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:00:04.0
Total memory: 11.17GiB
Free memory: 11.09GiB
2017-04-27 06:24:23.342994: I tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0 
2017-04-27 06:24:23.343049: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 0:   Y 
2017-04-27 06:24:23.343103: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:00:04.0)
2017-04-27 06:24:24.069732: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 18226 get requests, put_count=10104 evicted_count=1000 eviction_rate=0.0989707 and unsatisfied allocation rate=0.50598
2017-04-27 06:24:24.069915: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 100 to 110
2017-04-27 06:24:24.566246: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 18363 get requests, put_count=10456 evicted_count=1000 eviction_rate=0.0956389 and unsatisfied allocation rate=0.486304
2017-04-27 06:24:24.566429: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 256 to 281
batch 0: global_norm = 15739501.000
2017-04-27 06:24:25.017334: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 1023 get requests, put_count=2058 evicted_count=1000 eviction_rate=0.485909 and unsatisfied allocation rate=0.00879765
2017-04-27 06:24:25.017506: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 493 to 542
2017-04-27 06:24:25.480102: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 3475 get requests, put_count=6528 evicted_count=3000 eviction_rate=0.459559 and unsatisfied allocation rate=0.00028777
batch 1: global_norm = 14174161.000
2017-04-27 06:24:25.945373: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 4492 get requests, put_count=8551 evicted_count=4000 eviction_rate=0.467782 and unsatisfied allocation rate=0
2017-04-27 06:24:26.412995: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 6726 get requests, put_count=12791 evicted_count=6000 eviction_rate=0.46908 and unsatisfied allocation rate=0
batch 2: global_norm = 33107152.000
2017-04-27 06:24:26.882972: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 7952 get requests, put_count=15024 evicted_count=7000 eviction_rate=0.465921 and unsatisfied allocation rate=0
batch 3: global_norm = 15763463.000
2017-04-27 06:24:27.600348: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 1256 get requests, put_count=2343 evicted_count=1000 eviction_rate=0.426803 and unsatisfied allocation rate=0
2017-04-27 06:24:28.072395: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 3494 get requests, put_count=6589 evicted_count=3000 eviction_rate=0.455304 and unsatisfied allocation rate=0
batch 4: global_norm = 21566338.000
2017-04-27 06:24:28.549896: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 5730 get requests, put_count=10835 evicted_count=5000 eviction_rate=0.461467 and unsatisfied allocation rate=0
2017-04-27 06:24:29.028344: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 7757 get requests, put_count=14872 evicted_count=7000 eviction_rate=0.470683 and unsatisfied allocation rate=0
batch 5: global_norm = 21483036.000
2017-04-27 06:24:29.768236: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 1277 get requests, put_count=2417 evicted_count=1000 eviction_rate=0.413736 and unsatisfied allocation rate=0
batch 6: global_norm = 11463346.000
2017-04-27 06:24:30.257765: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 4525 get requests, put_count=8679 evicted_count=4000 eviction_rate=0.460883 and unsatisfied allocation rate=0
2017-04-27 06:24:30.752543: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 6977 get requests, put_count=13146 evicted_count=6000 eviction_rate=0.456413 and unsatisfied allocation rate=0
batch 7: global_norm = 11743794.000
2017-04-27 06:24:31.522024: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 2314 get requests, put_count=4518 evicted_count=2000 eviction_rate=0.442674 and unsatisfied allocation rate=0
batch 8: global_norm = 7594899.500
2017-04-27 06:24:32.030184: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 5775 get requests, put_count=11000 evicted_count=5000 eviction_rate=0.454545 and unsatisfied allocation rate=0
batch 9: global_norm = 12924121.000
2017-04-27 06:24:32.832804: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 2349 get requests, put_count=4621 evicted_count=2000 eviction_rate=0.432807 and unsatisfied allocation rate=0
batch 10: global_norm = 7920631.000
2017-04-27 06:24:33.656608: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 27653 get requests, put_count=28155 evicted_count=6000 eviction_rate=0.213106 and unsatisfied allocation rate=0.209634
2017-04-27 06:24:33.656770: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 3296 to 3625
batch 11: global_norm = 7384579.000
2017-04-27 06:24:34.519065: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 27351 get requests, put_count=27607 evicted_count=5000 eviction_rate=0.181113 and unsatisfied allocation rate=0.186684
2017-04-27 06:24:34.519240: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 3987 to 4385
batch 12: global_norm = 9704661.000
2017-04-27 06:24:35.432504: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 1618 get requests, put_count=3100 evicted_count=1000 eviction_rate=0.322581 and unsatisfied allocation rate=0
batch 13: global_norm = 10564804.000
2017-04-27 06:24:36.776085: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 1675 get requests, put_count=3316 evicted_count=1000 eviction_rate=0.301568 and unsatisfied allocation rate=0

Ignore 'batch n: global_norm = ...', they are from my own code. Near the top, I see

2017-04-27 06:24:23.343103: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:00:04.0)

According to How to tell if tensorflow is using gpu acceleration from inside python shell?, this line of message indicates that my tensorflow is utilizing gpu.

But, on the other hand, I don't see the usual messages saying that cuda library is successfully opened. For example, I expect to see this kind of message copied from that above link,

I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally

But, I don't see it when running my code on the google cloud instance. Does this mean that my tensorflow is using gpu, but not taking the full advantage of the cuda library(whatever that means)? By the way, computation is significantly faster on the google cloud than on my local computer that doesn't use gpu for tensorflow: when I ran a recurrent neural network consisting of 500 units, a single epoch took 91s on the google cloud and 1077s on my local computer. This seems like evidence that the google cloud uses gpu, but I am wondering if it can be faster in case it is not using CUDA library.

My tensorflow seems to be using gpu, but not showing the usual message "successfully opened CUDA library"

0 Answers0