16

I'm running multiple nested loops to do hyper parameter grid search. Each nested loop runs through a list of hyper parameter values and inside the innermost loop, a Keras sequential model is built and evaluated each time using a generator. (I'm not doing any training, I'm just randomly initializing and then evaluating the model multiple times and then retrieving the average loss).

My problem is that during this process, Keras seems to be filling up my GPU memory, so that I eventually get an OOM error.

Does anybody know how to solve this and free up the GPU memory each time after a model is evaluated?

I do not need the model anymore at all after it has been evaluated, I can throw it away entirely every time before building a new one in the next pass of the inner loop.

I'm using the Tensorflow backend.

Here is the code, although much of it isn't relevant to the general problem. The model is built inside the fourth loop,

for fsize in fsizes:

I guess the details of how the model is built don't matter much, but here is all of it anyway:

model_losses = []
model_names = []

for activation in activations:
    for i in range(len(layer_structures)):
        for width in layer_widths[i]:
            for fsize in fsizes:

                model_name = "test_{}_struc-{}_width-{}_fsize-{}".format(activation,i,np.array_str(np.array(width)),fsize)
                model_names.append(model_name)
                print("Testing new model: ", model_name)

                #Structure for this network
                structure = layer_structures[i]

                row, col, ch = 80, 160, 3  # Input image format

                model = Sequential()

                model.add(Lambda(lambda x: x/127.5 - 1.,
                          input_shape=(row, col, ch),
                          output_shape=(row, col, ch)))

                for j in range(len(structure)):
                    if structure[j] == 'conv':
                        model.add(Convolution2D(width[j], fsize, fsize))
                        model.add(BatchNormalization(axis=3, momentum=0.99))
                        if activation == 'relu':
                            model.add(Activation('relu'))
                        if activation == 'elu':
                            model.add(ELU())
                            model.add(MaxPooling2D())
                    elif structure[j] == 'dense':
                        if structure[j-1] == 'dense':
                            model.add(Dense(width[j]))
                            model.add(BatchNormalization(axis=1, momentum=0.99))
                            if activation == 'relu':
                                model.add(Activation('relu'))
                            elif activation == 'elu':
                                model.add(ELU())
                        else:
                            model.add(Flatten())
                            model.add(Dense(width[j]))
                            model.add(BatchNormalization(axis=1, momentum=0.99))
                            if activation == 'relu':
                                model.add(Activation('relu'))
                            elif activation == 'elu':
                                model.add(ELU())

                model.add(Dense(1))

                average_loss = 0
                for k in range(5):
                    model.compile(optimizer="adam", loss="mse")
                    val_generator = generate_batch(X_val, y_val, resize=(160,80))
                    loss = model.evaluate_generator(val_generator, len(y_val))
                    average_loss += loss

                average_loss /= 5

                model_losses.append(average_loss)

                print("Average loss after 5 initializations: {:.3f}".format(average_loss))
                print()
Alex
  • 3,316
  • 4
  • 26
  • 52

3 Answers3

20

As indicated, the backend being used is Tensorflow. With the Tensorflow backend the current model is not destroyed, so you need to clear the session.

After the usage of the model just put:

if K.backend() == 'tensorflow':
    K.clear_session()

Include the backend:

from keras import backend as K

Also you can use sklearn wrapper to do grid search. Check this example: here. Also for more advanced hyperparameter search you can use hyperas.

BazookaDave
  • 1,192
  • 9
  • 16
indraforyou
  • 8,969
  • 3
  • 49
  • 40
  • 1
    You are amazing! Thanks a lot, this is exactly what I needed to understand. And thanks a lot for pointing me to hyperopt/hyperas, too! – Alex Feb 05 '17 at 02:51
  • I thought `keras.wrappers.scikit_learn.KerasClassifier` took care of it. I didn't face problems running the given example. If you are facing problem please submit an issue at Keras github. – indraforyou Jan 10 '18 at 10:33
9

Using the tip given by indraforyou, I added the code to clear the TensorFlow session inside the function I pass to GridSearchCV, like this:

def create_model():
    # cleanup
    K.clear_session()

    inputs = Input(shape=(4096,))
    x = Dense(2048, activation='relu')(inputs)
    p = Dense(2, activation='sigmoid')(x)
    model = Model(input=inputs, outputs=p)
    model.compile(optimizer='SGD',
              loss='mse',
              metrics=['accuracy'])
    return model

And then I can invoke the grid search:

model = KerasClassifier(build_fn=create_model)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)

It should work.

Cheers!

0

adding backend.clear_session() worked for me:

from keras import backend as backend
def model_builder(hp):
    backend.clear_session()
    model = Sequential()
    hp_drop = hp.Float('drop', min_value=0, max_value=0.2, step=0.025)
    model.add(Dense(128, activation = "relu"))
    model.add(Dropout(hp_drop))
    model.add(Dense(1, activation = "relu"))

    model.compile(
        loss='mean_absolute_error',
        optimizer=tf.keras.optimizers.Adam(0.001),
        metrics=["mean_absolute_percentage_error"]
    )
    return model
Vojtech Stas
  • 631
  • 8
  • 22