Memory leak with tf.data

Question

I'm creating a tf.data.Dataset inside a for loop and I noticed that the memory was not freed as one would expect after each iteration.

Is there a way to request from TensorFlow to free the memory?

I tried using tf.reset_default_graph(), I tried calling del on the relevant python objects but this does not work.

The only thing that seems to work is gc.collect(). Unfortunately, gc.collect does not work on some more complex examples.

Fully reproducible code:

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import psutil
%matplotlib inline

memory_used = []
for i in range(500):
    data = tf.data.Dataset.from_tensor_slices(
                    np.random.uniform(size=(10, 500, 500)))\
                    .prefetch(64)\
                    .repeat(-1)\
                    .batch(3)
    data_it = data.make_initializable_iterator()
    next_element = data_it.get_next()

    with tf.Session() as sess:
        sess.run(data_it.initializer)
        sess.run(next_element)
    memory_used.append(psutil.virtual_memory().used / 2 ** 30)
    tf.reset_default_graph()

plt.plot(memory_used)
plt.title('Evolution of memory')
plt.xlabel('iteration')
plt.ylabel('memory used (GB)')

score 2 · Answer 1 · edited Mar 03 '20 at 18:53

The issue is that you're adding a new node to the graph to define the iterator after each iteration, a simple rule of thumb is never define new tensorflow variables inside a loop. To fix it move

data = tf.data.Dataset.from_tensor_slices(
            np.random.uniform(size=(10, 500, 500)))\
            .prefetch(64)\
            .repeat(-1)\
            .batch(3)
data_it = data.make_initializable_iterator()
next_element = data_it.get_next()

outside the for loop and just call sess.run(next_element) to fetch the next example and once youve gone through all the training/eval examples call sess.run(data_it) to reinitialize the iterator.

score 2 · Answer 2 · answered Apr 06 '21 at 14:52

2

This fix worked for me when I had a similar issue with TF 2.4

sudo apt-get install libtcmalloc-minimal4
LD_PRELOAD=/path/to/libtcmalloc_minimal.so.4 python example.py

answered Apr 06 '21 at 14:52

joakimedin

21
2

1

Any tips if this only works on every 10th run? – michaelschufi Apr 30 '21 at 19:36

score 0 · Answer 3 · answered Nov 14 '21 at 13:54

If you only need to create then save the dataset to disk in the loop body, such as when wanting to preprocess large amounts of data in smaller parts to avoid out-of-memory, launch the loop body in a subprocess.

This answer describes how to launch subprocesses in general.

score -1 · Answer 4 · answered Mar 17 '19 at 21:49

Dataset API handles iteration via built-in iterator, at least while eager mode is off or TF version is not 2.0. So, there's simply no need to create dataset object from numpy array inside for loop, as it writes values in the graph as tf.constant. This is not the case with data = tf.data.TFRecordDataset(), so if you transform your data to tfrecords format and run it inside for loop it won't leak memory.

for i in range(500):
    data = tf.data.TFRecordDataset('file.tfrecords')\
        .prefetch(64)\
        .repeat(-1)\
        .batch(1)
    data_it = data.make_initializable_iterator()
    next_element = data_it.get_next()
    with tf.Session() as sess:
        sess.run(data_it.initializer)
        sess.run(next_element)
    memory_used.append(psutil.virtual_memory().used / 2 ** 30)
    tf.reset_default_graph()

But as I said, there's no need to create dataset inside a loop.

data = tf.data.Dataset.from_tensor_slices(
                    np.random.uniform(size=(10, 500, 500)))\
                    .prefetch(64)\
                    .repeat(-1)\
                    .batch(3)
data_it = data.make_initializable_iterator()
next_element = data_it.get_next()

for i in range(500):
    with tf.Session() as sess:
        ...

I'm familiar with the `tf.data` API, my question is on a different point, i.e. explicitly freeing memory allocated by TensorFlow with `tf.data`. — BiBi, Mar 17 '19 at 22:07
Take a loop at this, it specifically points out why it's not a good idea to try to free memory https://github.com/tensorflow/tensorflow/issues/14181 — Sharky, Mar 18 '19 at 10:27

score -1 · Answer 5 · answered Mar 17 '19 at 21:56

-1

You are creating new python object (dataset) ever iteration of a loop and looks like garbage collector is not being invoked. Add impplicit garbage collection call and the memory usage should be fine.

Other than that, as mentioned in other answer, keep building data obect and session outside of the loop.

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import psutil
import gc

%matplotlib inline

memory_used = []
for i in range(100):
    data = tf.data.Dataset.from_tensor_slices(
                    np.random.uniform(size=(10, 500, 500)))\
                    .prefetch(64)\
                    .repeat(-1)\
                    .batch(3)
    data_it = data.make_initializable_iterator()
    next_element = data_it.get_next()

    with tf.Session() as sess:
        sess.run(data_it.initializer)
        sess.run(next_element)
    memory_used.append(psutil.virtual_memory().used / 2 ** 30)
    tf.reset_default_graph()
    gc.collect()

plt.plot(memory_used)
plt.title('Evolution of memory')
plt.xlabel('iteration')
plt.ylabel('memory used (GB)')

answered Mar 17 '19 at 21:56

MPękalski

6,873
4
26
36

1

Sorry, I've just noticed that you did write that you tried gc.collect(). But what would be the more complex usecase? – MPękalski Mar 17 '19 at 21:59
Yes, I tried `gc.collect()`. My more complex use case involves many `*.tfrecord` files with a quite complex data pipeline. In this more complex use case, `gc.collect()` does not work. Thus my question: how to explicitly free the memory allocated by TensorFlow. – BiBi Mar 17 '19 at 22:03
I read somewhere about it and in general: with a GPU memory it is not possible to free it, with Python it has to be `gc.collect()` but then you have numerous typical issues with Python that does not want to free the memory - there is a big threat about it on stackoverflow. – MPękalski Mar 17 '19 at 22:19
Here, I’m referring to the general RAM, not the memory of the GPU. – BiBi Mar 17 '19 at 22:27
1

I think you need to describe your usecase more precisely. Like do you need to run through all those different TFRecord files at once or not? – MPękalski Mar 17 '19 at 22:31

Memory leak with tf.data

5 Answers5

Linked