Tensorflow: Setting allow_growth to true does still allocate memory of all my GPUs

Question

I have several GPUs but I only want to use one GPU for my training. I am using following options:

config = tf.ConfigProto(allow_soft_placement=True, log_device_placement=True)
config.gpu_options.allow_growth = True

with tf.Session(config=config) as sess:

Despite setting / using all these options, all of my GPUs allocate memory and

#processes = #GPUs

How can I prevent this from happening?

Note

I do not want use set the devices manually and I do not want to set CUDA_VISIBLE_DEVICES since I want tensorflow to automatically find the best (an idle) GPU available
When I try to start another run it uses the same GPU that is already used by another tensorflow process even though there are several other free GPUs (apart from the memory allocation on them)
I am running tensorflow in a docker container: tensorflow/tensorflow:latest-devel-gpu-py

It seems very weird. could you please try to post the full code and the TF version you're using? — Matan Hugi, Dec 20 '17 at 21:01
did you try to specify initial memory fraction? `gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333) sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))` https://stackoverflow.com/questions/34199233/how-to-prevent-tensorflow-from-allocating-the-totality-of-a-gpu-memory — y.selivonchyk, Dec 20 '17 at 21:26
The full code is more than 5 scripts, so I cannot show you the full code, unfortunately, but I think I made my point clear? Or is there anything specific you would like to see? I have added the tensorflow version I am working on. @MatanHugi — , Dec 21 '17 at 07:27
No, I have not, but I am sure this will not help me with my problem. @yauheni_selivonchyk — , Dec 21 '17 at 07:29
Tensorflow has no logic to find the beast (idle) GPU available. — Alexandre Passos, Dec 21 '17 at 21:04
Hm, okay that's bad but okay, at least it shouldn't allocate memory on every single GPU neither should it start as many processes as there are GPUs — , Dec 22 '17 at 07:46

score 19 · Answer 1 · edited Feb 19 '20 at 03:26

19

I had this problem my self. Setting config.gpu_options.allow_growth = True Did not do the trick, and all of the GPU memory was still consumed by Tensorflow. The way around it is the undocumented environment variable TF_FORCE_GPU_ALLOW_GROWTH (I found it in https://github.com/tensorflow/tensorflow/blob/3e21fe5faedab3a8258d344c8ad1cec2612a8aa8/tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc#L25)

Setting TF_FORCE_GPU_ALLOW_GROWTH=true works perfectly.

In the Python code, you can set

os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = 'true'

edited Feb 19 '20 at 03:26

GoingMyWay

16,802
32
96
149

answered Feb 28 '19 at 13:50

Guy Geva

341
2
5

2

Great! But note that if you are using SLURM, you will need to add "export TF_FORCE_GPU_ALLOW_GROWTH=true" to your sbatch script, because you won't be able to set environment variables from within python! – midawn98 Apr 28 '20 at 21:07

y.selivonchyk · Accepted Answer · 2017-12-27T20:24:42.283

I can offer you a method mask_busy_gpus defined here: https://github.com/yselivonchyk/TensorFlow_DCIGN/blob/master/utils.py

Simplified version of the function:

import subprocess as sp
import os

def mask_unused_gpus(leave_unmasked=1):
  ACCEPTABLE_AVAILABLE_MEMORY = 1024
  COMMAND = "nvidia-smi --query-gpu=memory.free --format=csv"

  try:
    _output_to_list = lambda x: x.decode('ascii').split('\n')[:-1]
    memory_free_info = _output_to_list(sp.check_output(COMMAND.split()))[1:]
    memory_free_values = [int(x.split()[0]) for i, x in enumerate(memory_free_info)]
    available_gpus = [i for i, x in enumerate(memory_free_values) if x > ACCEPTABLE_AVAILABLE_MEMORY]

    if len(available_gpus) < leave_unmasked: ValueError('Found only %d usable GPUs in the system' % len(available_gpus))
    os.environ["CUDA_VISIBLE_DEVICES"] = ','.join(map(str, available_gpus[:leave_unmasked]))
  except Exception as e:
    print('"nvidia-smi" is probably not installed. GPUs are not masked', e)

Usage:

mask_unused_gpus()
with tf.Session()...

Prerequesities: nvidia-smi

With this script I was solving next problem: on a multy-GPU cluster use only single (or arbitrary) number of GPUs allowing them to be automatically allocated.

Shortcoming of the script: if you are starting multiple scripts at once random assignment might cause same GPU assignment, because script depends on memory allocation and memory allocation takes some seconds to kick in.

Tensorflow: Setting allow_growth to true does still allocate memory of all my GPUs

2 Answers2