16

I have a memory leak with TensorFlow. I refered to Tensorflow : Memory leak even while closing Session? to address my issue, and I followed the advices of the answer, that seemed to have solved the problem. However it does not work here.

In order to recreate the memory leak, I have created a simple example. First, I use this function (that I got here : How to get current CPU and RAM usage in Python?) to check the memory use of the python process :

def memory():
    import os
    import psutil
    pid = os.getpid()
    py = psutil.Process(pid)
    memoryUse = py.memory_info()[0]/2.**30  # memory use in GB...I think
    print('memory use:', memoryUse)

Then, everytime I call the build_model function, the use of memory increases.

Here is the build_model function that has a memory leak :

def build_model():

    '''Model'''

    tf.reset_default_graph()


    with tf.Graph().as_default(), tf.Session() as sess:
        tf.contrib.keras.backend.set_session(sess)

        labels = tf.placeholder(tf.float32, shape=(None, 1))
        input = tf.placeholder(tf.float32, shape=(None, 1))

        x = tf.contrib.keras.layers.Dense(30, activation='relu', name='dense1')(input)
        x1 = tf.contrib.keras.layers.Dropout(0.5)(x)
        x2 = tf.contrib.keras.layers.Dense(30, activation='relu', name='dense2')(x1)
        y = tf.contrib.keras.layers.Dense(1, activation='sigmoid', name='dense3')(x2)


        loss = tf.reduce_mean(tf.contrib.keras.losses.binary_crossentropy(labels, y))

        train_step = tf.train.AdamOptimizer(0.004).minimize(loss)

        #Initialize all variables
        init_op = tf.global_variables_initializer()
        sess.run(init_op)

        sess.close()

    tf.reset_default_graph()

    return 

I would have thought that using the block with tf.Graph().as_default(), tf.Session() as sess: and then closing the session and calling tf.reset_default_graph would clear all the memory used by TensorFlow. Apparently it does not.

The memory leak can be recreated as following :

memory()
build_model()
memory()
build_model()
memory()

The output of this is (for my computer) :

memory use: 0.1794891357421875
memory use: 0.184417724609375
memory use: 0.18923568725585938

Clearly we can see that all the memory used by TensorFlow is not freed afterwards. Why?

I plotted the use of memory over 100 iterations of calling build_model, and this is what I get :

Memory use over 100 iterations

I think that goes to show that there is a memory leak.

Syzygy
  • 402
  • 1
  • 3
  • 15
  • what is the error message you are getting ? – Shamane Siriwardhana Jun 04 '17 at 10:20
  • There is no error message. The issue is that memory is leaking each time I call the function `build_model`. – Syzygy Jun 04 '17 at 10:24
  • In the graph what is the X axis. Is that like you execute this build_model for that many iterations? – Shamane Siriwardhana Jun 04 '17 at 12:04
  • Yes exactly. It's the number of times `build_model` was called. – Syzygy Jun 04 '17 at 12:05
  • So what is happening is it keeps adding up the memory in each iteration. And not releasing right?. Normally TF load all the operations in the graph first and then execute them in a session. Here for each iteration you create a new session right? – Shamane Siriwardhana Jun 04 '17 at 12:21
  • Yes it keeps using more and more memory without releasing it afterwards. And yes at each iteration I create a new session and then close it. I also call `tf.reset_default_graph` at each iteration, which should release any memory used by Tensorflow. – Syzygy Jun 04 '17 at 12:41
  • you don't have to use sess.close() when using with statement. And only thing that evaluating is variable initializer right? – Shamane Siriwardhana Jun 04 '17 at 12:45
  • Yes. And yes you're right I don't have to close the session. – Syzygy Jun 04 '17 at 12:46
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/145803/discussion-between-syzygy-and-shamane-siriwardhana). – Syzygy Jun 04 '17 at 13:25
  • Did you find a solution to this problem? I have the same problem! @Syzygy – Mehran Jun 15 '17 at 19:25
  • Check this : https://github.com/tensorflow/tensorflow/issues/10408 in fact you'll need to have Tensorflow 1.12 and call K.clear_session – Syzygy Jun 15 '17 at 19:31

5 Answers5

4

The problem was due to Tensorflow version 0.11. As of today Tensorflow 0.12 is out and the bug is resolved. Upgrade to a newer version and it should work as expected. Don't forget to call tf.contrib.keras.backend.clear_session() at the end.

rerx
  • 1,133
  • 8
  • 19
Syzygy
  • 402
  • 1
  • 3
  • 15
  • 3
    The problem persists in newer versions: https://stackoverflow.com/questions/53687165/tensorflow-memory-leak-when-building-graph-in-a-loop – Safoora Yousefi Dec 11 '18 at 15:29
  • @SafooraYousefi in the question you linked, the poster does not reset the graph between iterations. As of tensorflow 1.13, this solution still works (`tf.reset_default_graph`). – BiBi Mar 17 '19 at 18:52
2

I had this same problem. Tensorflow (v2.0.0) was consuming ~ 0.3GB every EPOCH in an LSTM model I was training. I discovered that the tensorflow callback hooks were the main culprit. I removed the tensorboard callback & it worked fine after

history = model.fit(
        train_x,
        train_y,
        epochs=EPOCHS,
        batch_size=BATCH_SIZE,
        validation_data=(test_x, test_y)
        ,callbacks=[tensorboard, checkpoint]
)
0

Normally what happened is we use the loop outside of a session. I think here what is happening is at each time you add more and more memory chunks when running this init_op = tf.global_variables_initializer(). Because if the loop is outside the session it will only get initialized for once. What happen hear is it's always get initialized and keep that in the memory.

Editing the answer since still you have the memory issue

The possibly it's the graph. Because each time you will create a graph which will hold the memory.Try to remove it and run. By removing it will take your all operations as the default graph. I think you need some kind of memory flush function outside the tensorflow. because each time when you run this it will stack up a graph.

Shamane Siriwardhana
  • 3,951
  • 6
  • 33
  • 73
  • Unfortunately `tf.global_variables_initializer()`is **not** the source of the problem. You can re-create the same memory leak even if you remove `init_op = tf.global_variables_initializer()`and `sess.run(init_op)` – Syzygy Jun 04 '17 at 13:13
  • that means even when you running the graph with out a session? – Shamane Siriwardhana Jun 04 '17 at 13:30
  • Yes. Let's not keep adding comments. We can talk in chat. – Syzygy Jun 04 '17 at 13:33
  • So here you are building graphs at each iteration. Normally we initialize graph before the loop. – Shamane Siriwardhana Jun 04 '17 at 14:09
  • I need to initialize a new graph at each loop. This is a simple example of a broader project where I need to do just that. – Syzygy Jun 04 '17 at 16:51
  • oh that's serious. You also want to keep the memory in each graph or you are ok with discarding the graph memory allocation after each loop? – Shamane Siriwardhana Jun 04 '17 at 17:36
  • After each iteration I want to discard the graph memory allocation. I need to completely wipe the memory used by the graph after it is used – Syzygy Jun 04 '17 at 17:38
  • Well that's bit tricky. So I think you can use a python way to release all the memory which has used to build the operation. Or delete all the collection values in the graph use https://www.tensorflow.org/api_docs/python/tf/Graph#clear_collection – Shamane Siriwardhana Jun 04 '17 at 17:52
  • "I think you can use a python way to release all the memory which has used to build the operation." : How would you do that ? Can you give me a code example ? – Syzygy Jun 04 '17 at 18:10
  • did you try this ? https://stackoverflow.com/questions/33765336/remove-nodes-from-graph-or-reset-entire-default-graph – Shamane Siriwardhana Jun 05 '17 at 04:40
  • Please check my code before commenting. I tried using `tf.reset_default_graph` already, it is in my question. – Syzygy Jun 05 '17 at 07:17
  • oh sorry, Then try to use del sess. This del key word will delete the session. – Shamane Siriwardhana Jun 05 '17 at 07:29
  • We never got this problem since we use one session to execute our loop. This may be a serious problem. Not releasing the memory after a execution. It's clearly a session is holding the memory. I also tried few things but nothing came good. – Shamane Siriwardhana Jun 05 '17 at 07:53
  • 1
    Thanks for your help. I also posted an issue on the tensorflow repository on github : https://github.com/tensorflow/tensorflow/issues/10408 Clearly there is something wrong here. Do you happen to know someone who could fix the problem ? – Syzygy Jun 05 '17 at 08:09
0

I faced something similar in TF 1.12 as well. Don't create the graph and session for every iteration. Every time the graph is created and variable initialized, you are not redefining the old graph but creating new ones leading to memory leaks. I was able to solve this by defining the graph once and then passing the session to my iterative logic.

From How not program Tensorflow

  • Be conscious of when you’re creating ops, and only create the ones you need. Try to keep op creation distinct from op execution.
  • Especially if you’re just working with the default graph and running interactively in a regular REPL or a notebook, you can end up with a lot of abandoned ops in your graph. Every time you re-run a notebook cell that defines any graph ops, you aren’t just redefining ops—you’re creating new ones.

Also, see this great answer for better understanding.

ug2409
  • 344
  • 1
  • 6
0

This memory leak issue was resolved in the recent stable version Tensorflow 1.15.0. I ran the code in the question and I see almost a no leak as shown below. There were lots of performance improvements in the recent stable version of TF1.15 and TF2.0.

memory use: 0.4033699035644531
memory use: 0.4062042236328125
memory use: 0.4088172912597656

Please check the colab gist here. Thanks!

Vishnuvardhan Janapati
  • 3,088
  • 1
  • 16
  • 25