5

I am fitting a model in a for loop, but I am getting an error that my GPU's memory is full. I am using Keras in Anaconda Spyder IDE. My GPU is a Asus GTX 1060 6gb.

I have also used codes like: K.clear_session(), gc.collect(), tf.reset_default_graph(), del custom_model but none of them worked. GPU properties say's 98% of memory is full:

Nothing flush GPU memory except numba.cuda.close() but will not allow me to use my GPU again. The only way to clear it is restarting kernel and rerun my code.

I am looking for any script code to add my code allow me to use my code in for loop and clear GPU in every loop.

Timisorean
  • 1,388
  • 7
  • 20
  • 30
  • You should include code that reproduces the problem in your question – Dr. Snoopy Apr 02 '19 at 16:46
  • Are you creating the model inside your loop? Why then? – Simon Caby Apr 02 '19 at 19:28
  • @SimonCaby Because i'm calculating accuracy on noise data and i need to average 50 times of train and test results. – Sepehr Ghafari Apr 04 '19 at 07:49
  • @MatiasValdenegro Part of my code : image_input = Input(shape=(224, 224, 3)) base_model = Xception(input_tensor=image_input, include_top=False,weights='imagenet') custom_Xception_model.compile(loss='categorical_crossentropy',optimizer='adadelta',metrics=['accuracy']) hist = base_model.fit(X,Y,epochs=2) It is simple training with keras. I just need to use it in loop and in clear GPU memory in last of every loop. – Sepehr Ghafari Apr 04 '19 at 07:51
  • OK. You should not build the model in the loop, but just loading and training the weights. You should not clear the model (and the memory). – Simon Caby Apr 05 '19 at 08:42
  • @SimonCaby I don't build the model, i use pre-trained models like Xception.Even i just put training in the loop and don't make any change to weights, model,compile and anything else,i got OOM error of GPU at starting of epoch 1 in second loop. – Sepehr Ghafari Apr 06 '19 at 04:46
  • Can you try solution i have posted here https://stackoverflow.com/questions/61284338/keras-uses-gpu-for-first-2-epochs-then-stops-using-it/62064458#62064458 i was also facing same issue I have GeForce GTX 1060 Graphics Cards – silentsudo May 28 '20 at 12:27

1 Answers1

2

Wrap up the model creation and training part in a function then use subprocess for the main work. When training is done, subprocess will be terminated and GPU memory will be free.

something like:

import multiprocessing

def create_model_and_train( ):
      .....
      .....

p = multiprocessing.Process(target=create_model_and_train) 
p.start() 
p.join()

Or you can create below function and call it before each run:

from keras.backend.tensorflow_backend import set_session
from keras.backend.tensorflow_backend import clear_session
from keras.backend.tensorflow_backend import get_session
import tensorflow
import gc

# Reset Keras Session
def reset_keras():
    sess = get_session()
    clear_session()
    sess.close()
    sess = get_session()

    try:
        del classifier # this is from global space - change this as you need
    except:
        pass

    print(gc.collect()) # if it does something you should see a number as output

    # use the same config as you used to create the session
    config = tensorflow.ConfigProto()
    config.gpu_options.per_process_gpu_memory_fraction = 1
    config.gpu_options.visible_device_list = "0"
    set_session(tensorflow.Session(config=config))
Abhi25t
  • 3,703
  • 3
  • 19
  • 32