Django GPU model deployed in Nivdia/Cuda container consumes all GPU memory

Question

I'm using an Nvidia/Cuda container to host my Django website. It's a new deployment of an old website that used to utilize CPU scored models. The rationale for using the Nivida/Cuda docker is to accelerate scoring speed when requesting an analysis through the Django interface.

The difficulty I am running into is that my docker-compose build is generating GPU memory errors. I hadn't anticipated that Celery / Django would load the models directly into the GPU in advance of an actual scoring call, and that this process would consume so much space. Accordingly, the GPU memory is quickly consumed and my website is not launching appropriately.

My question is whether there are ways that I could manage the GPU memory more effectively. Presently, I am loading the Tensorflow models in my Django settings.py invocation. Since I am using Celery, it is effectively doubling the GPU memory demand. At runtime, most models (but not all) are using Celery-based scoring mechanisms.

Some options I am considering:

Ways to eliminate non-scoring components of the Tensorflow model that take up unnecessary space;
Passing an environment variable to identify Celery vs Django to conditionally load models;
Limiting my Tensorflow model complexity to reduce size;
Reducing Celery concurrency (was 10; now set to 1) to limit duplication in GPU memory.

Any other possibilities that others have used?

score 0 · Answer 1 · answered Jan 15 '23 at 21:33

It turns out that Tensorflow was consuming as much memory as possible, and when compounded with multiple processes (Celery, etc.), it was "over-consuming". The following "typical" memory limitation code added to my settings.py file has solved the issue:

import tensorflow as tf

gpus = tf.config.list_physical_devices('GPU')
if gpus:
  try:
    # Currently, memory growth needs to be the same across GPUs
    for gpu in gpus:
      tf.config.experimental.set_memory_growth(gpu, True)
    logical_gpus = tf.config.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Memory growth must be set before GPUs have been initialized
    print(e)

Django GPU model deployed in Nivdia/Cuda container consumes all GPU memory

1 Answers1