Tensorflow serving : Using a fraction of GPU memory for each model

Question

I have one GPU at my disposal for deployment but multiple models need to be deployed. I don't want to allocate the full GPU memory to the first deployed model because then I can't deploy my subsequent models. While training, this could be controlled using gpu_memory_fraction parameter. I am using the following command to deploy my model -

tensorflow_model_server --port=9000 --model_name=<name of model> --model_base_path=<path where exported models are stored &> <log file path>

Is there a flag that I can set to control the gpu memory allocation?

Thanks

Does [this](https://stackoverflow.com/questions/34199233/how-to-prevent-tensorflow-from-allocating-the-totality-of-a-gpu-memory?rq=1) help? — Imran, Dec 01 '17 at 08:17
@Imran No, my query is regarding memory allocation inside tensorflow serving. — dragster, Dec 01 '17 at 17:00
You can find an open bug here https://github.com/tensorflow/serving/issues/249. Tldr; there doesn't seem to be an option and apparantly you will have to change the option manually and recompile the binary as explained in the post I linked. — rajat, Dec 04 '17 at 20:19

score 3 · Answer 1 · answered Feb 11 '18 at 23:56

3

The new TF Serving allowed to set flag per_process_gpu_memory_fraction in this pull request

answered Feb 11 '18 at 23:56

Dat

5,405
2
31
32

score 1 · Answer 2 · answered Dec 14 '17 at 11:05

1

I have just add one flag to config gpu memory fraction. https://github.com/zhouyoulie/serving

answered Dec 14 '17 at 11:05

John Zhou

11
2

Tensorflow serving : Using a fraction of GPU memory for each model

2 Answers2