I'm using an Nvidia/Cuda container to host my Django website. It's a new deployment of an old website that used to utilize CPU scored models. The rationale for using the Nivida/Cuda docker is to accelerate scoring speed when requesting an analysis through the Django interface.
The difficulty I am running into is that my docker-compose build is generating GPU memory errors. I hadn't anticipated that Celery / Django would load the models directly into the GPU in advance of an actual scoring call, and that this process would consume so much space. Accordingly, the GPU memory is quickly consumed and my website is not launching appropriately.
My question is whether there are ways that I could manage the GPU memory more effectively. Presently, I am loading the Tensorflow models in my Django settings.py invocation. Since I am using Celery, it is effectively doubling the GPU memory demand. At runtime, most models (but not all) are using Celery-based scoring mechanisms.
Some options I am considering:
- Ways to eliminate non-scoring components of the Tensorflow model that take up unnecessary space;
- Passing an environment variable to identify Celery vs Django to conditionally load models;
- Limiting my Tensorflow model complexity to reduce size;
- Reducing Celery concurrency (was 10; now set to 1) to limit duplication in GPU memory.
Any other possibilities that others have used?