I am using two graphic cards and the GeForce gtx980 with 4GB, where I compute my neuronal network is always jumping from 0 to 99% and from 99% to 0% (repeating) at the last line of the pasted shell output.
After around 90seconds it did the first calculation. I put my images one after another into the neuronal network (for-loop). And the following calculations only need 20 seconds (3 epochs) and the GPU jumps between 96 and 100%.
Why is it jumping at the beginning?
I use the flag:
config.gpu_options.allow_growth = True
with tf.Session(config=config) as sess:
Can I be sure that is really using not less megabytes than nvidia-smi -lms 50
is showing me?
2017-08-10 16:33:24.836084: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-10 16:33:24.836100: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-10 16:33:25.052501: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-08-10 16:33:25.052861: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties:
name: GeForce GTX 980
major: 5 minor: 2 memoryClockRate (GHz) 1.2155
pciBusID 0000:03:00.0
Total memory: 3.94GiB
Free memory: 3.87GiB
2017-08-10 16:33:25.187760: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x8532640 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that.
2017-08-10 16:33:25.188006: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-08-10 16:33:25.188291: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 1 with properties:
name: GeForce GT 730
major: 3 minor: 5 memoryClockRate (GHz) 0.9015
pciBusID 0000:02:00.0
Total memory: 1.95GiB
Free memory: 1.45GiB
2017-08-10 16:33:25.188312: I tensorflow/core/common_runtime/gpu/gpu_device.cc:832] Peer access not supported between device ordinals 0 and 1
2017-08-10 16:33:25.188319: I tensorflow/core/common_runtime/gpu/gpu_device.cc:832] Peer access not supported between device ordinals 1 and 0
2017-08-10 16:33:25.188329: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 1
2017-08-10 16:33:25.188335: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: Y N
2017-08-10 16:33:25.188339: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 1: N Y
2017-08-10 16:33:25.188348: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980, pci bus id: 0000:03:00.0)
Epoche: 0001 cost= 0.620101001 time= 115.366318226
Epoche: 0004 cost= 0.335480299 time= 19.4528050423