I have pretty much the same question that has already been answered here, with a slight difference though:
I'm working on a server with a few GPUs that I share with my colleagues for training our deep learning models. The server should also run a small web application that samples from our models. The sampling script uses the relatively new eager execution. In theory it allows me to stop Tensorflow from allocating all the GPU memory by providing a configuration like this:
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
tf.enable_eager_execution(config=config)
In practice this does not work though. The documentation of the eager execution also states that not all the configuration options that work for sessions will work in eager execution (here). But how can I limit the used memory then?
I know that I can limit the visible devices like this:
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
But I don't want to constantly block an entire GPU for a task that gets called very occasionally and actually needs way less memory.