Keras: Out of memory when doing hyper parameter grid search

Question

I'm running multiple nested loops to do hyper parameter grid search. Each nested loop runs through a list of hyper parameter values and inside the innermost loop, a Keras sequential model is built and evaluated each time using a generator. (I'm not doing any training, I'm just randomly initializing and then evaluating the model multiple times and then retrieving the average loss).

My problem is that during this process, Keras seems to be filling up my GPU memory, so that I eventually get an OOM error.

Does anybody know how to solve this and free up the GPU memory each time after a model is evaluated?

I do not need the model anymore at all after it has been evaluated, I can throw it away entirely every time before building a new one in the next pass of the inner loop.

I'm using the Tensorflow backend.

Here is the code, although much of it isn't relevant to the general problem. The model is built inside the fourth loop,

for fsize in fsizes:

I guess the details of how the model is built don't matter much, but here is all of it anyway:

model_losses = []
model_names = []

for activation in activations:
    for i in range(len(layer_structures)):
        for width in layer_widths[i]:
            for fsize in fsizes:

                model_name = "test_{}_struc-{}_width-{}_fsize-{}".format(activation,i,np.array_str(np.array(width)),fsize)
                model_names.append(model_name)
                print("Testing new model: ", model_name)

                #Structure for this network
                structure = layer_structures[i]

                row, col, ch = 80, 160, 3  # Input image format

                model = Sequential()

                model.add(Lambda(lambda x: x/127.5 - 1.,
                          input_shape=(row, col, ch),
                          output_shape=(row, col, ch)))

                for j in range(len(structure)):
                    if structure[j] == 'conv':
                        model.add(Convolution2D(width[j], fsize, fsize))
                        model.add(BatchNormalization(axis=3, momentum=0.99))
                        if activation == 'relu':
                            model.add(Activation('relu'))
                        if activation == 'elu':
                            model.add(ELU())
                            model.add(MaxPooling2D())
                    elif structure[j] == 'dense':
                        if structure[j-1] == 'dense':
                            model.add(Dense(width[j]))
                            model.add(BatchNormalization(axis=1, momentum=0.99))
                            if activation == 'relu':
                                model.add(Activation('relu'))
                            elif activation == 'elu':
                                model.add(ELU())
                        else:
                            model.add(Flatten())
                            model.add(Dense(width[j]))
                            model.add(BatchNormalization(axis=1, momentum=0.99))
                            if activation == 'relu':
                                model.add(Activation('relu'))
                            elif activation == 'elu':
                                model.add(ELU())

                model.add(Dense(1))

                average_loss = 0
                for k in range(5):
                    model.compile(optimizer="adam", loss="mse")
                    val_generator = generate_batch(X_val, y_val, resize=(160,80))
                    loss = model.evaluate_generator(val_generator, len(y_val))
                    average_loss += loss

                average_loss /= 5

                model_losses.append(average_loss)

                print("Average loss after 5 initializations: {:.3f}".format(average_loss))
                print()

@indraforyou I'm using the Tensorflow backend, sorry for not mentioning that! — Alex, Feb 05 '17 at 01:17

score 20 · Accepted Answer · edited Apr 01 '19 at 16:24

20

As indicated, the backend being used is Tensorflow. With the Tensorflow backend the current model is not destroyed, so you need to clear the session.

After the usage of the model just put:

if K.backend() == 'tensorflow':
    K.clear_session()

Include the backend:

from keras import backend as K

Also you can use sklearn wrapper to do grid search. Check this example: here. Also for more advanced hyperparameter search you can use hyperas.

edited Apr 01 '19 at 16:24

BazookaDave

1,192
9
16

answered Feb 05 '17 at 01:25

indraforyou

8,969
3
49
40

1

You are amazing! Thanks a lot, this is exactly what I needed to understand. And thanks a lot for pointing me to hyperopt/hyperas, too! – Alex Feb 05 '17 at 02:51
I thought `keras.wrappers.scikit_learn.KerasClassifier` took care of it. I didn't face problems running the given example. If you are facing problem please submit an issue at Keras github. – indraforyou Jan 10 '18 at 10:33

score 9 · Answer 2 · answered Nov 28 '18 at 14:25

9

Using the tip given by indraforyou, I added the code to clear the TensorFlow session inside the function I pass to GridSearchCV, like this:

def create_model():
    # cleanup
    K.clear_session()

    inputs = Input(shape=(4096,))
    x = Dense(2048, activation='relu')(inputs)
    p = Dense(2, activation='sigmoid')(x)
    model = Model(input=inputs, outputs=p)
    model.compile(optimizer='SGD',
              loss='mse',
              metrics=['accuracy'])
    return model

And then I can invoke the grid search:

model = KerasClassifier(build_fn=create_model)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)

It should work.

Cheers!

answered Nov 28 '18 at 14:25

Andrey Kuehlkamp

400
4
6

Any chance how to make it work with more than one job in parallel? – Fernando Ferreira Oct 19 '21 at 03:22
I had to use "tf.keras.backend.clear_session()" from https://www.tensorflow.org/api_docs/python/tf/keras/backend/clear_session – trinity420 Jul 28 '22 at 11:46
won't it make we lose all the progress so far? – Victor Ferreira Jul 31 '23 at 04:31

score 0 · Answer 3 · answered Apr 14 '22 at 13:13

adding backend.clear_session() worked for me:

from keras import backend as backend
def model_builder(hp):
    backend.clear_session()
    model = Sequential()
    hp_drop = hp.Float('drop', min_value=0, max_value=0.2, step=0.025)
    model.add(Dense(128, activation = "relu"))
    model.add(Dropout(hp_drop))
    model.add(Dense(1, activation = "relu"))

    model.compile(
        loss='mean_absolute_error',
        optimizer=tf.keras.optimizers.Adam(0.001),
        metrics=["mean_absolute_percentage_error"]
    )
    return model

Keras: Out of memory when doing hyper parameter grid search

3 Answers3

Linked