I ran the MNIST demo in TensorFlow with 2 conv layers and a full-conect layer, I got an message that 'ran out of memeory trying to allocate 2.59GiB' , but it shows that total memory is 4.69GiB, and free memory is 3.22GiB, how can it stop with 2.59GiB? And with larger network, how can I manage gpu memory? I concern only how to make best use of the gpu memory and wanna know how it happened, not how to pre-allocating memory
-
Possible duplicate of [How to prevent tensorflow from allocating the totality of a GPU memory?](http://stackoverflow.com/questions/34199233/how-to-prevent-tensorflow-from-allocating-the-totality-of-a-gpu-memory) – Sung Kim Apr 29 '16 at 02:50
-
I saw it before, but it refers to pre-allocating gpu memory, not lacking of memory – Fangxin Apr 29 '16 at 06:08
11 Answers
I was encountering out of memory errors when training a small CNN on a GTX 970. Through somewhat of a fluke, I discovered that telling TensorFlow to allocate memory on the GPU as needed (instead of up front) resolved all my issues. This can be accomplished using the following Python code:
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
Previously, TensorFlow would pre-allocate ~90% of GPU memory. For some unknown reason, this would later result in out-of-memory errors even though the model could fit entirely in GPU memory. By using the above code, I no longer have OOM errors.
Note: If the model is too big to fit in GPU memory, this probably won't help!
-
1
-
-
1@nickandross "For some unknown reason..." I just wanted to add that the reason is to avoid unnecessary/additional data transfer from main RAM to GPU memory, since the data transfer is much slower than the computations themselves and can become the bottleneck. It therefore can save time to transfer as much data as possible first, instead of allocating a little bit, computing some (fast), waiting for more data to be transferred (relatively slowly), then compute fast, than wait again for more data to arrive at the GPU etc... – Elegant Code Jan 23 '20 at 23:25
-
@Schütze See my answer below - perhaps your problem is you're using Tensorflow 2? – ssp Mar 10 '20 at 19:53
-
Neat! Can this be done in the Tensorflow.js Javascript bindings too? – starbeamrainbowlabs Feb 10 '21 at 17:44
-
1@ElegantCode Just for the record, "for some unknown reason..." is referencing the second part of the sentence (the out of memory), not the previous sentence (talking about pre-allocation). – Feb 10 '21 at 19:45
-
@nickandross , Thanks for pointing that out. By the way, since I wrote that comment, I'm a year older, and a year wiser (whatever that means...), and I wanted to add, that pre-allocation also helps to avoid memory fragmentation, and in theory, should help reduce memory usage (contrary to what you've experienced... I have no idea why it works for you without pre-allocation but not with it; then again, those are bugs that are probably long resolved in the newer TF). – Elegant Code Feb 11 '21 at 00:22
-
Yeah, that's what was so confusing to me at the time. Like you said, I'm guessing whatever was happening back then is no longer applicable given how the library has been restructured in 2.0. – Feb 11 '21 at 00:58
-
I get: AttributeError: module 'tensorflow' has no attribute 'ConfigProto'. Why is that? – Viktor Feb 07 '23 at 14:12
It's not about that. first of all you can see how much memory it gets when it runs by monitoring your gpu. for example if you have a nvidia gpu u can check that with watch -n 1 nvidia-smi
command.
But in most cases if you didn't set the maximum fraction of gpu memory, it allocates almost the whole free memory. your problem is lack of enough memory for your gpu. cnn networks are totally heavy. When you are trying to feed your network DO NOT do it with your whole data. DO this feeding procedure in low batch sizes.

- 679
- 6
- 10
-
6I have a rather large network (CNN+LSTM). My input data is of size, batch_size = 5, (5x396x396) -- it's a 3D volume. So a rather small batch size. I'm running on a GTX 1070 with 8GB RAM, but I'm still running out of memory. Are there any workarounds you know of? Any tutorials that outline workarounds? – Kendall Weihe Aug 11 '16 at 19:02
-
Then how do I pass small batch sizes? I use `train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(batch_size=batch_size)` and then iterate as `for x_batch, y_batch in train_dataset.prefetch(tf.data.experimental.AUTOTUNE).cache():` and yet running out of memory. Any ideas? I don't want to post a question with the same title but no response in here worked for me. – J Agustin Barrachina Jun 22 '20 at 15:53
-
Do not use `cache()`. When you use this command everything before is saved in the memory. For such a dataset avoid caching. It gives a significant boost in speed but only if you can get your hands on a larger RAM. – ayush thakur Feb 28 '21 at 19:12
By default, TensorFlow maps nearly all of the GPU memory of all GPUs (subject to CUDA_VISIBLE_DEVICES) visible to the process. This is done to more efficiently use the relatively precious GPU memory resources on the devices by reducing memory fragmentation.
TensorFlow provides two Config options on the Session to control this.
The first is the allow_growth option, which attempts to allocate only as much GPU memory based on runtime allocations:
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config)
The second method is the per_process_gpu_memory_fraction option, which determines the fraction of the overall amount of memory that each visible GPU should be allocated. For example, you can tell TensorFlow to only allocate 40% of the total memory of each GPU by:
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4
session = tf.Session(config=config)

- 2,683
- 2
- 30
- 42
-
A small note... this information is obtained via the tensorflow guide on how to use it with gpu: https://www.tensorflow.org/guide/using_gpu – zwep Sep 21 '18 at 15:30
Tensorflow 2
As we don't have sessions anymore the solution is not longer viable.
By default, TensorFlow maps nearly all of the GPU memory of all GPUs (subject to CUDA_VISIBLE_DEVICES) visible to the process.
In some cases, it is desirable for the process to only allocate a subset of the available memory, or to only grow the memory usage as is needed by the process. TensorFlow provides two methods to control this. One of them is using set_memory_growth tf.config.experimental.set_memory_growth
For a full understanding, I recommend this link: Limiting GPU memory growth

- 3,501
- 1
- 32
- 52
From TensorFlow guide
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
# Restrict TensorFlow to only allocate 1GB of memory on the first GPU
try:
tf.config.experimental.set_virtual_device_configuration(gpus[0],
[tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)])
logical_gpus = tf.config.experimental.list_logical_devices('GPU')
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
# Virtual devices must be set before GPUs have been initialized
print(e)
Adjust memory_limit=*value*
to something reasonable for your GPU.
e.g. with 1070ti accessed from Nvidia docker container and remote screen sessions this was memory_limit=7168
for no further errors. Just need to make sure sessions on GPU cleared occasionally (e.g. Jupyter Kernel restarts).

- 3,501
- 1
- 32
- 52

- 51
- 1
- 3
For Tensorflow 2:
config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.compat.v1.Session(config=config)

- 1,666
- 11
- 15
TensorFlow Dataset objects. This is a high-performance option that is more suitable for datasets that do not fit in memory and that are streamed from disk or from a distributed filesystem.
If you have a large dataset and you are training on GPU(s), consider using Dataset objects, since they will take care of performance-critical details, such as:
Asynchronously preprocessing your data on CPU while your GPU is busy, and buffering it into a queue. Prefetching data on GPU memory so it's immediately available when the GPU has finished processing the previous batch, so you can reach full GPU utilization.
- tf.keras.preprocessing.image_dataset_from_directory turns image files sorted into class-specific folders into a labeled dataset of image tensors.
Resource : https://keras.io/getting_started/intro_to_keras_for_engineers/

- 111
- 1
for reducing the batch size to 8 from 32 works
-
1Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jun 27 '22 at 13:35
Before dwelving into other possible explanations like the ones mentioned above, please check that there is no hung process reserving GPU memory. It has just happened to me that my Tensorflow script got hung on some error but I did not notice it because I monitored running processes with nvidia-smi. Now that hung script did not show up in nvidia-smi's output but was still reserving GPU memory. Killing the hung scripts (Tensorflow typically spawns as many as there are GPUs in the system) completely solved a similar problem (after having exhausted all the TF wizardry).

- 175
- 1
- 8
For Tensorflow 2 or Keras:
from tensorflow.python.framework.config import set_memory_growth
tf.compat.v1.disable_v2_behavior()
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:
for gpu in gpus:
set_memory_growth(gpu, True)
except RuntimeError as e:
print(e)

- 1
- 1
I realize you guys are all TensorFlow fans and unlikely to try other frameworks... we addressed the GPU memory issue from the ground up when designing our system. We allocate as much as possible on the GPU, but if a few layers don't fit, then they go on the CPU. In particular we needed many DNNs loaded at once for evaluation once deployed in a medical device. To achieve this we have specific Eval libraries that reuse GPU memory as much as possible. When doing eval you don't need all the previous calculations for many of the layers so that can be dynamically freed/or reused directly if already big enough. Currently we have 5 image segmentation DNNs loaded and running simultaneously, any one of them completely fills the GPU when training, but all fit in 4GB in eval mode.
-
3I can accept the first paragraph as enough of an answer. But please delete the last line yourself, in order to avoid a malicious impression. Consider taking the [tour] and reading [answer] please. – Yunnosch Sep 24 '22 at 18:19
-