I have a Keras model that consists of various layers. One of those layers is in fact another entire model. When that layer-model is VGG16, I can train the uber-model. This is the model.summary():
When I simply swap out VGG16 and swap in EfficientNetB0 (a much smaller model!), I can no longer train the uber-model. This is the model.summary():
"Ah," you say, "something something trainable weights something something gradients in memory something something."
Ok, sure. Let's shrink and/or nuke a bunch of layers so that the trainable weights are reduced below the VGG16 version. After doing so, I still can't train the uber-model! This is the model.summary():
Why can't I train the much smaller model that has 30% of the total weights and 25% of the trainable weights?
Sample code:
img = Input(shape = (224,224,3))
# ^ Don't reuse img= or Tensorflow will break!
x = BatchNormalization()(img)
x = Efnet(x)
# x = Vgg16(x)
x = Activation('linear')(x)
# ^ Don't remove pass through or Tensorflow will break!
x = GeneralizedMeanPool2D(x)
x = NormActDrop(x)
x = DenseNormActDrop(x, 32)
x = DenseNormActDrop(x, 32)
x = DenseNormActDrop(x, 32)
x = Dense(1)(x)
model = Model(inputs = img, outputs = x)
model.summary()
The custom blocks are precisely as simple as you would imagine them to be, e.g.:
def NormActDrop(x, activation_function = mish, dropout_rate = 0.2):
x = BatchNormalization()(x)
x = Activation(activation_function)(x)
x = Dropout(dropout_rate)(x)
return x