3

Background

I am watching a popular YouTube crash course on machine learning.

At 3:35:50, he mentions that the model is likely overfit, so fits it again with less epochs.

Since he didn't reinstantiate the model, isn't this equivalent to fitting the model with that same data, thereby continuing to overtrain it?

My Question

Assume you have a model created and data ready to go.

You run:

model.fit(train_images, train_labels, epochs=10)
model.fit(train_images, train_labels, epochs=8)

Is this equivalent to running:

model.fit(train_images, train_labels, epochs=18)

Or:

model.fit(train_images, train_labels, epochs=8)

If previously fitted data is overwritten, why does running model.fit a second time begin with the accuracy of the previous model?

In multiple other questions regarding saving and training models, the accepted solutions are to load the previously trained model, and run model.fit again.

If this will overwrite the pre-existing weights, doesn't that defeat the purpose of saving the model in the first place? Wouldn't training the model for the first time on the new data be equivalent?

What is the appropriate way to train a model across multiple, similar datasets while retaining accuracy across all of the data?

Brice Frisco
  • 409
  • 1
  • 4
  • 15
  • You are right. Fitting for 10 epochs and then 8 epochs is equivalent to fitting 18 epochs. Something weird is going on in the notebook in the video. Perhaps his model is reloaded when he presses play or something. The answer in the referenced question is about scikit learn models and not relevant to keras models. – Björn Lindqvist May 31 '20 at 19:20

3 Answers3

5

Since he didn't reinstantiate the model, isn't this equivalent to fitting the model with that same data, thereby continuing to overtrain it?

You are correct! In order to check which number of epochs would do better in his example, he should have compiled the network again (that is, execute the above cell again).

Just remember that in general, whenever you instantiate a model again it most likely will start with completely new weights, totally different from past weights (unless you change this manually). So even though you keep the same amount of epochs, your final accuracy can change depending on the initial weights.

Are these two commands equivalent?

model.fit(train_images, train_labels, epochs=10)
model.fit(train_images, train_labels, epochs=8)

and

model.fit(train_images, train_labels, epochs=18)

No.

In the first case, you are training your network with some weights X going through all your training set 10 times, then you update your weights for some value y. Then you will train your network again though all your training set 8 times but now you are using a network with weights X+y.

For the second case, you will train your network through all your training data 18 times with the weights X.

This is different!

ihavenoidea
  • 629
  • 1
  • 7
  • 26
  • The `model` he is using is defined as a variable in the cell above. If you have once cell with `x = 10`, and a second cell below with `x = x + 10`, each time you run the second cell x will increment by 10. I believe a similar result is occurring with the model. – Brice Frisco May 31 '20 at 18:52
  • 1
    I'll double check that! – ihavenoidea May 31 '20 at 18:57
  • I understand your second point - that makes sense. Thanks! Although running model.fit twice on the same dataset is not exactly equivalent to running it once with the same number of epochs, it seems in his example that he is continuing to overtrain his dataset. I believe his intention was to clear all history of the model and run it with less epochs, which he is not accomplishing by running `fit` again. Does this make sense, do you agree? – Brice Frisco May 31 '20 at 19:00
  • @BriceFrisco I updated the answer, thanks for the correction! `it seems in his example that he is continuing to overtrain his dataset. I believe his intention was to clear all history of the model and run it with less epochs, which he is not accomplishing by running fit again.` Yep, I completely agree – ihavenoidea May 31 '20 at 19:08
2

When you run the with

model.fit(train_images, train_labels, epochs=10)
model.fit(train_images, train_labels, epochs=8)

as you mentioned the model are not reinitialized so the model object would have the values of the previous train in it and hence continue.

Just realized that the model is being run in colab.

then when he run

model.fit(train_images, train_labels, epochs=10)

he is training it for the first time and finds that it is overfitting, then he aims to reduce the epoch to 8 and try it.

model.fit(train_images, train_labels, epochs=8)

What he wanted to do was run for 8 epochs but since it was in the colab and the model.fit() from the first run was still in the object it was acting as if it's running for 18 epochs and over fitted

As to how to avoid over fitting, one of the methods is to use EarlyStopping and ModelCheckpoint .

Krishna Srinidhi
  • 370
  • 1
  • 4
  • 12
-1

To avoid overfitting, you can add Dropout layers, it will dropout % of connections, you just need to add it to your model between Dense layers.

from keras.layers import Dropout
Dropout(0.2)  # droping 20% only in training
Grzegorz Krug
  • 186
  • 2
  • 11