0

I want to limit the memory usage per gpu. As suggested in this answer, I do as following:

config = tf.ConfigProto(allow_soft_placement=True, gpu_options=tf.GPUOptions(per_process_gpu_memory_fraction=0.9))
saver = tf.train.Saver()
sv = tf.train.Supervisor(logdir=FLAGS.log_root,
                         is_chief=True,
                         saver=saver,
                         summary_op=None,
                         save_summaries_secs=60,
                         save_model_secs=FLAGS.checkpoint_secs,
                         global_step=model.global_step)
sess = sv.prepare_or_wait_for_session(config=config)

But it still does not work (the GPU-Util of one of the GPUs has achieved to 100%). Could you please tell me how to fix this issue? Thanks in advance!

user5779223
  • 1,460
  • 3
  • 21
  • 42
  • What do you mean usage? Could you show an example of your `nvidia-smi`? – jkschin Jun 30 '17 at 05:20
  • @jkschin The `Volatile GPU-Util ` of one GPU is 100% – user5779223 Jun 30 '17 at 05:22
  • I think it's normal. See what `Volatile GPU-Util` means [here](https://stackoverflow.com/questions/40937894/nvidia-smi-volatile-gpu-utilization-explanation). You should look at `Memory Usage`. – jkschin Jun 30 '17 at 05:27
  • @jkschin thanks for your correction. But it is weird. When the `Volatile GPU-Util` is 100%, `pool_allocator` runs quite slow. – user5779223 Jun 30 '17 at 05:37
  • P.S. It's actually simply `GPU-Util`. The Volatile belongs to something else. I don't know what slow means but all `GPU-Util` shows is how many % of your GPU kernels are running. If your kernels are an extremely complex operation, I won't be surprised to see 100%. – jkschin Jun 30 '17 at 05:42
  • @jkschin After creating the device, tf do the pool allocation. And this step cost lots of time if there is 100 % `GPU-Util` – user5779223 Jun 30 '17 at 06:05

1 Answers1

0

This post talks more about what GPU-Util actually means.

Note that it's not Volatile GPU-Util. The Volatile actually belongs to Volatile Uncorr. ECC. GPU-Util actually exists on it's own.

With regard to your question, seeing 100% GPU-Util is perfectly normal. To see if memory is being limited, you should really be looking at Memory-Usage and make an estimate of how much has been allocated.

jkschin
  • 5,776
  • 6
  • 35
  • 62