Why does a smaller Keras model run out of memory?

Question

I have a Keras model that consists of various layers. One of those layers is in fact another entire model. When that layer-model is VGG16, I can train the uber-model. This is the model.summary():

When I simply swap out VGG16 and swap in EfficientNetB0 (a much smaller model!), I can no longer train the uber-model. This is the model.summary():

"Ah," you say, "something something trainable weights something something gradients in memory something something."

Ok, sure. Let's shrink and/or nuke a bunch of layers so that the trainable weights are reduced below the VGG16 version. After doing so, I still can't train the uber-model! This is the model.summary():

Why can't I train the much smaller model that has 30% of the total weights and 25% of the trainable weights?

Sample code:

img = Input(shape = (224,224,3))
# ^ Don't reuse img= or Tensorflow will break!
x = BatchNormalization()(img)
x = Efnet(x)
# x = Vgg16(x)
x = Activation('linear')(x)
# ^ Don't remove pass through or Tensorflow will break!
x = GeneralizedMeanPool2D(x)
x = NormActDrop(x)
x = DenseNormActDrop(x, 32)
x = DenseNormActDrop(x, 32)
x = DenseNormActDrop(x, 32)
x = Dense(1)(x)

model = Model(inputs = img, outputs = x)
model.summary()

The custom blocks are precisely as simple as you would imagine them to be, e.g.:

def NormActDrop(x, activation_function = mish, dropout_rate = 0.2):

    x = BatchNormalization()(x)
    x = Activation(activation_function)(x)
    x = Dropout(dropout_rate)(x)

    return x

Do you have a reproducible code so that we can better understand what exactly you are using? Also keep in mind that most of the times, the activations are the biggest memory consumers. See this for example: https://cs231n.github.io/convolutional-networks/#case . I ran some numbers for VGG and efficient-net and it seems efficient net indeed has less activations than VGG, but this should be taken into account, — Zaccharie Ramzi, Apr 22 '20 at 09:15
@MaxCrous What is an out of memory stacktrace? If you mean whatever the console spits out while everything is crashing and burning, that's thousands of lines of garbage. — Nickolas, Apr 22 '20 at 18:04
@ZaccharieRamzi The code is structurally identical to the model summaries, but sure, I added some code. VGG and Efnet are both frozen, and merely being swapped for one another in an otherwise literally identical structure. — Nickolas, Apr 22 '20 at 18:21
@Nickolas thanks for sharing this! However, I don't know how you imported `Efnet` or `GeneralizedMeanPool2D`, or if `NormActDrop` and `DenseNormActDrop` are the same thing. The best would be if I could just copy-paste your code in a colab notebook and just run it with the 2 different configs, to see for myself and dig a bit in it. Ideally, you could even write that colab notebook and share it with us. I know these things take especially long, but in my experience they have been very beneficial for me, sometimes allowing me to solve the issue myself. — Zaccharie Ramzi, Apr 25 '20 at 09:26
This question sounds similar (and also doesn't have a satisfactory answer at this stage) https://stackoverflow.com/q/60456843/1295595. I think it is related to the backprop step, which is more complicated for EfficientNet. Freezing the first few layers of EfficientNet seems to be an effective workaround. — craq, Apr 14 '21 at 00:21

score 0 · Answer 1 · answered Apr 22 '20 at 14:03

0

Depending on what version of Tensorflow your using and what error is being reported there could be many answers. One answer that I've found in the past is that with certain models if the allow_growth parameter is set to False, the model will run fill up the GPU. You could try setting this to be True.

See this stack overflow question and answer for more details

Something like this could work though...

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)

Let me know if this helps. In general, please post more details regarding your problem (reproducible code, stack trace, etc.)

answered Apr 22 '20 at 14:03

Darien Schettler

546
4
13

I was already using this setting, though FYI your code no longer works (as of Tensorflow 2.0.) You should be advising people to do this instead: `config = tf.compat.v1.ConfigProto()`, `config.gpu_options.allow_growth = True`, and `sess = tf.compat.v1.Session(config = config)` – Nickolas Apr 22 '20 at 18:24
You didn't indicate which version of TF you are using otherwise I would have tailored it. As you reference Keras directly it would appear to me as though you're using TF 1.x. That being said you are correct that the code you put is more appropriate for TF 2.x. Perhaps more details in your original post would help avoid issues like this? – Darien Schettler Apr 22 '20 at 18:44

Why does a *smaller* Keras model run out of memory?

1 Answers1

Why does a smaller Keras model run out of memory?