Linux: Ubuntu 16.04.3 LTS (GNU/Linux 4.10.0-38-generic x86_64)
Tensorflow: compile from source, 1.4
GPU: 4xP100
I am trying the new released object detection tutorial training program. I noticed that there is big difference when I set CUDA_VISIBLE_DEVICES to different value. Specifically, when it is set to "gpu:0", the gpu util is quite high like 80%-90%, but when I set it to other gpu devices, such as gpu:1, gpu:2 etc. The gpu util is very low between 10%-30%.
As for the training speed, it seems to be roughly the same, much faster than that when using CPU only.
I just curious how this happens.