tf.keras model.predict results in memory leak

Question

Working on google colab. Using tf.keras and tensorflow version 2.3.0 I'm getting crazy because I can't use the model I've trained to run predictions with model.predict because it runs out of CPU RAM. I've been able to reproduce the issue with a very minimal example.

import numpy as np
import tensorflow as tf
from tensorflow.keras import backend as K
from tensorflow.keras.layers import Input,Conv2D, Activation

matrixSide = 512 #define a big enough matrix to give memory issues

inputL = Input([matrixSide,matrixSide,12]) #create a toy model
l1 = Conv2D(32,3,activation='relu',padding='same') (inputL) #120
l1 = Conv2D(64,1,activation='relu',padding='same')(l1)
l1 = Conv2D(64,3,activation='relu',padding='same')(l1)
l1 = Conv2D(1,1,padding='same')(l1)
l1 = Activation('linear')(l1)
model = Model(inputs= inputL,outputs = l1)


#run predictions
inImm = np.zeros((64,matrixSide,matrixSide,12))
for i in range (60):
  print(i)
  outImm = model.predict(inImm)
# K.clear_session() #somebody suggested it...

Basically, when working on GPU, it uses 3.0 GB of CPU RAM in the first 4 iterations,then it goes up to 7, then to 10 then it crashes because it exhausted all the available RAM! When running on CPU it lasts for more iterations, sometimes it even decreases the amount of RAM it's using from 9 GB back to 3 GB but in the end it still crashes after 20 or so iterations.

This previous example ( Keras predict loop memory leak using tf.data.Dataset but not with a numpy array ) had similar issues when using tf.data but not with numpy. Somebody suggested on github issues for tensorflow 1.14 to do a K.clear_session in each loop... but it doesn't help!

Any idea on how to fix this?

if this is TF 1.x - add commands opening and closing session properly or use `with` context: https://stackoverflow.com/questions/53885356/purpose-of-using-with-tf-session — Poe Dator, Oct 04 '20 at 20:11
I faced that issue too. Best thing I could do : gathered a list of workarounds here : [Keras memory leak](https://www.thekerneltrip.com/python/keras-memory-leak/) — RUser4512, Oct 01 '21 at 08:31

score 8 · Answer 1 · answered Nov 14 '21 at 06:00

8

I'am using simple solution based on keras docs

For small amount of inputs that fit in one batch, directly using call() is recommended for faster execution, e.g., model(x), or model(x, training=False)

for filename in image_filenames:
  # read of data
  input = load_image(filename)

  # prediction
  output = model(input) # executes __call__() or call()

Using of __call__() or model(input) avoids memory leaks inside predict method which creates a data generator with one data item each time of execution and doesn't release memory.

answered Nov 14 '21 at 06:00

Фёдор Мелёхин

114
1
3

1

This is exactly the right solution, other answers are bad workarounds. Unfortunately I've found it after I solved the problem myself the same way :). – yǝsʞǝla Oct 15 '22 at 03:44
`For small amount of inputs that fit in one batch` How large is the default batch size? Can I configure it? – David H. J. Dec 16 '22 at 02:23
`model(input)` is faster for small input compared to `model.predict(input)` . – David H. J. Dec 16 '22 at 02:30

score 6 · Answer 2 · answered Oct 05 '20 at 14:01

I've found a fix for the memory leak. While K.clear_session() doesn't do anything in my case, adding a garbage collection after each call with _ = gc.collect() actually does the trick! The memory used actually is constant now and I can run as many prediction as I want.

score 6 · Answer 3 · answered Nov 10 '20 at 08:07

6

This is my understanding after posting this as a bug to Tensorflow.

Changing the code to;

in_imm = np.zeros((64,matrix_side,matrix_side,12))
for i in range (60):
  print(i)
  tensor = tf.convert_to_tensor(in_imm, dtype=tf.float32)
  out_imm = model.predict(tensor)

Using tf.keras.Model.predict in a for loop with a numpy input creates a new graph every iteration because the numpy array is created with a different signature. Converting the numpy array to a tensor maintains the same signature and avoids creating new graphs.

answered Nov 10 '20 at 08:07

Codey McCodeface

2,988
6
30
55

3

Thank you, I'll try this! By the way, is there a clear explanation somewhere of what these "graphs" actually are and how they behave? And if it was a graph problem, why didn't K.clear_sessions() work (but gc.collect() did)? – user26067 Nov 10 '20 at 10:22
Does not improve the memory leak for me, but when using `model(tensor)` it works without (notably) leak. – John May 02 '23 at 10:36

Maximilian S. · Answer 4 · 2021-04-09T16:21:05.527

I solved the problem by using K.clear_session(). First of all you need to define a session before one can clear it. The purpose for this is explained in both of these, here and here.

config= tf.ConfigProto(log_device_placement=True) 
config.gpu_options.allow_growth = True
session = tf.Session(config=config)
K.set_session(session)

At first, using K.clear_session() in the loop results in an error after the first prediction. In my opinion, tf loses the connection to the model. For this reason, I create a new model within every run of the loop. This negativly effects the code's speed for the first multiple runs, however an accumulation of RAM storage is prevented.

The following code contains the suggested improvements:

import numpy as np
import tensorflow as tf
from tensorflow.keras import backend as K
from tensorflow.keras.layers import Input,Conv2D, Activation

matrixSide = 512 #define a big enough matrix to give memory issues

config = tf.ConfigProto(log_device_placement=True)
config.gpu_options.allow_growth = True
session = tf.Session(config=config)
K.set_session(session)

def create_model(matrixSide_v):
    inputL = Input([matrixSide_v,matrixSide_v,12]) #create a toy model
    l1 = Conv2D(32,3,activation='relu',padding='same') (inputL) #120
    l1 = Conv2D(64,1,activation='relu',padding='same')(l1)
    l1 = Conv2D(64,3,activation='relu',padding='same')(l1)
    l1 = Conv2D(1,1,padding='same')(l1)
    l1 = Activation('linear')(l1)
    c_model = Model(inputs= inputL,outputs = l1)
    return c_model

#run predictions
inImm = np.zeros((64,matrixSide,matrixSide,12))
for i in range(64):
    print(i)
    model = create_model(matrixSide)
    outImm = model.predict(inImm)
    K.clear_session()

In my use case I use a trained model to predict a very large number of samples. It doesn't make sense to load a model and clear session for each prediction, as it is both slow, and loading a model has also a documented memory-leak issue. Any suggestion how to apply your solution to my use case? — Itamar Katz, Apr 21 '21 at 07:12
I agree, reloading the model each iteration will slow things up 10000 times and is not really a practical solution. — Jeremy, Sep 05 '21 at 03:14
I had this issue in a setting where I did not create a new model in a loop but had many evaluations of a model (in a DQN reinforcement learning setting) and using `K.clear_session()` was the only thing that worked. — ljbkusters, Jun 22 '23 at 07:50

score 0 · Answer 5 · answered Jul 06 '22 at 14:32

0

I tried lots of methods online, they did not work, but I finally solve this problem by using tensorflow 1.13 rather than 2.x , old version did help.

answered Jul 06 '22 at 14:32

eastwood gale

9
2

tf.keras model.predict results in memory leak

5 Answers5

Linked