0

I am training an image classifier using Large EfficientNet:

base_model = EfficientNetV2L(input_shape = (300, 500, 3),
                            include_top = False,
                            weights = 'imagenet',
                            include_preprocessing = True)

model = tf.keras.Sequential([base_model,
                             layers.GlobalAveragePooling2D(),
                             layers.Dropout(0.2),
                             layers.Dense(128, activation = 'relu'),
                             layers.Dropout(0.3),
                             layers.Dense(6, activation = 'softmax')])

base_model.trainable = False

model.compile(optimizer = optimizers.Adam(learning_rate = 0.001),
              loss = losses.SparseCategoricalCrossentropy(),
              metrics = ['accuracy'])

callback = [callbacks.EarlyStopping(monitor = 'val_loss', patience = 2)]
history =  model.fit(ds_train, batch_size = 28, validation_data = ds_val, epochs = 20, verbose = 1, callbacks = callback)

it is working properly.

model summary:

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 efficientnetv2-l (Functiona  (None, 10, 16, 1280)     117746848 
 l)                                                              
                                                                 
 global_average_pooling2d (G  (None, 1280)             0         
 lobalAveragePooling2D)                                          
                                                                 
 dropout (Dropout)           (None, 1280)              0         
                                                                 
 dense (Dense)               (None, 128)               163968    
                                                                 
 dropout_1 (Dropout)         (None, 128)               0         
                                                                 
 dense_1 (Dense)             (None, 6)                 774       
                                                                 
=================================================================
Total params: 117,911,590
Trainable params: 164,742
Non-trainable params: 117,746,848
_________________________________________________________________

output:

Epoch 4/20
179/179 [==============================] - 203s 1s/step - loss: 0.1559 - accuracy: 0.9474 - val_loss: 0.1732 - val_accuracy: 0.9428

But, while fine-tuning it, I am unfreezing some weights:

base_model.trainable = True

fine_tune_at = 900
for layer in base_model.layers[:fine_tune_at]:
  layer.trainable = False

model.compile(optimizer = optimizers.Adam(learning_rate = 0.0001),
              loss = losses.SparseCategoricalCrossentropy(),
              metrics = ['accuracy'])

history =  model.fit(ds_train, batch_size = 28, validation_data = ds_val, epochs = 20, verbose = 1, callbacks = callback)

model summary:

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 efficientnetv2-l (Functiona  (None, 10, 16, 1280)     117746848 
 l)                                                              
                                                                 
 global_average_pooling2d (G  (None, 1280)             0         
 lobalAveragePooling2D)                                          
                                                                 
 dropout (Dropout)           (None, 1280)              0         
                                                                 
 dense (Dense)               (None, 128)               163968    
                                                                 
 dropout_1 (Dropout)         (None, 128)               0         
                                                                 
 dense_1 (Dense)             (None, 6)                 774       
                                                                 
=================================================================
Total params: 117,911,590
Trainable params: 44,592,230
Non-trainable params: 73,319,360
_________________________________________________________________

And, it is starting the training all over again. For the first time, when I trained it with freezed weights, the loss decreased to 0.1559, after unfreezing the weights, the model started training again from loss = 0.444. Why is this happening? I think fine tuning should't reset the weights.

Adarsh Wase
  • 1,727
  • 3
  • 12
  • 26

1 Answers1

0

When training again the Adam lr rate for each node is set again to the initial lr maybe that is the reason for the big jump after you start the learning again.You can also specify to save and load the optimizer values as well when saving/loading the model. Maybe look here. You are also retraining a lot of parameters maybe reduce the amount of parameters. If you keep more old parameters the jump might not be that high.

Finn Meyer
  • 89
  • 4