GPU memory nearly full after defining tf.distribute.MirroredStrategy?

Question

I am coming across a strange issue when using TensorFlow (2.9.1). After defining a distributed training strategy, my GPU memory appears to fill.

Steps to reproduce are simple:

import tensorflow as tf
strat = tf.distribute.MirroredStrategy()

After the first line (importing TensorFlow), nvidia-smi outputs:

Fri Jun 10 03:01:47 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.103.01   Driver Version: 470.103.01   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro P6000        Off  | 00000000:04:00.0 Off |                  Off |
| 26%   25C    P8     9W / 250W |      0MiB / 24449MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Quadro P6000        Off  | 00000000:06:00.0 Off |                  Off |
| 26%   20C    P8     7W / 250W |      0MiB / 24449MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

After the second line of code, nvidia-smi outputs:

Fri Jun 10 03:02:43 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.103.01   Driver Version: 470.103.01   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro P6000        Off  | 00000000:04:00.0 Off |                  Off |
| 26%   29C    P0    59W / 250W |  23951MiB / 24449MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Quadro P6000        Off  | 00000000:06:00.0 Off |                  Off |
| 26%   25C    P0    58W / 250W |  23951MiB / 24449MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A   1833720      C   python                          23949MiB |
|    1   N/A  N/A   1833720      C   python                          23949MiB |
+-----------------------------------------------------------------------------+

The GPU memory is almost entirely full? There is also some terminal output:

2022-06-10 03:02:37.442336: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-06-10 03:02:39.136390: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 23678 MB memory:  -> device: 0, name: Quadro P6000, pci bus id: 0000:04:00.0, compute capability: 6.1
2022-06-10 03:02:39.139204: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 23678 MB memory:  -> device: 1, name: Quadro P6000, pci bus id: 0000:06:00.0, compute capability: 6.1
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1')

Any ideas on why this is occurring would be helpful! Additional details about my configuration:

Python 3.10.4 [GCC 7.5.0] on linux
tensorflow 2.9.1
cuda/11.2.2 cudnn/v8.2.1

score 1 · Accepted Answer · answered Jun 10 '22 at 07:13

1

By default, Tensorflow will map almost all of your GPU memory: official guide. This is for performance reasons: by allocating the GPU memory, it reduces latency that memory growth would typically cause.

You can try using tf.config.experimental.set_memory_growth to prevent it from immediately filling up all its memory. There are also some good explanations on this StackOverflow post.

answered Jun 10 '22 at 07:13

M Z

4,571
2
13
27

So how do people typically evaluate the batch size they can handle if you can't accurately see a memory footprint? – sandsoft Jun 10 '22 at 07:19
Your first indicator for what batch size to use shouldn't just be available memory. Given the choice and the ability to do so, you should tune your batch size as a hyperparameter. The batch size you choose should be specific to your dataset and model. You should take a moment to look at hyperparameter tuning and selecting batch size: it will legitimately affect your model's performance. Picking the largest batch size that will fit in your memory might not always be the best option. – M Z Jun 10 '22 at 07:22
1

That being said, you *can* do as suggested above and limit your growth. It will give you a rough idea of how much memory is going to be needed. – M Z Jun 10 '22 at 07:23

GPU memory nearly full after defining tf.distribute.MirroredStrategy?

1 Answers1