146

I was wondering if it was possible to save a partly trained Keras model and continue the training after loading the model again.

The reason for this is that I will have more training data in the future and I do not want to retrain the whole model again.

The functions which I am using are:

#Partly train model
model.fit(first_training, first_classes, batch_size=32, nb_epoch=20)

#Save partly trained model
model.save('partly_trained.h5')

#Load partly trained model
from keras.models import load_model
model = load_model('partly_trained.h5')

#Continue training
model.fit(second_training, second_classes, batch_size=32, nb_epoch=20)

Edit 1: added fully working example

With the first dataset after 10 epochs the loss of the last epoch will be 0.0748 and the accuracy 0.9863.

After saving, deleting and reloading the model the loss and accuracy of the model trained on the second dataset will be 0.1711 and 0.9504 respectively.

Is this caused by the new training data or by a completely re-trained model?

"""
Model by: http://machinelearningmastery.com/
"""
# load (downloaded if needed) the MNIST dataset
import numpy
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.utils import np_utils
from keras.models import load_model
numpy.random.seed(7)

def baseline_model():
    model = Sequential()
    model.add(Dense(num_pixels, input_dim=num_pixels, init='normal', activation='relu'))
    model.add(Dense(num_classes, init='normal', activation='softmax'))
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

if __name__ == '__main__':
    # load data
    (X_train, y_train), (X_test, y_test) = mnist.load_data()

    # flatten 28*28 images to a 784 vector for each image
    num_pixels = X_train.shape[1] * X_train.shape[2]
    X_train = X_train.reshape(X_train.shape[0], num_pixels).astype('float32')
    X_test = X_test.reshape(X_test.shape[0], num_pixels).astype('float32')
    # normalize inputs from 0-255 to 0-1
    X_train = X_train / 255
    X_test = X_test / 255
    # one hot encode outputs
    y_train = np_utils.to_categorical(y_train)
    y_test = np_utils.to_categorical(y_test)
    num_classes = y_test.shape[1]

    # build the model
    model = baseline_model()

    #Partly train model
    dataset1_x = X_train[:3000]
    dataset1_y = y_train[:3000]
    model.fit(dataset1_x, dataset1_y, nb_epoch=10, batch_size=200, verbose=2)

    # Final evaluation of the model
    scores = model.evaluate(X_test, y_test, verbose=0)
    print("Baseline Error: %.2f%%" % (100-scores[1]*100))

    #Save partly trained model
    model.save('partly_trained.h5')
    del model

    #Reload model
    model = load_model('partly_trained.h5')

    #Continue training
    dataset2_x = X_train[3000:]
    dataset2_y = y_train[3000:]
    model.fit(dataset2_x, dataset2_y, nb_epoch=10, batch_size=200, verbose=2)
    scores = model.evaluate(X_test, y_test, verbose=0)
    print("Baseline Error: %.2f%%" % (100-scores[1]*100))

Edit 2: tensorflow.keras remarks

For tensorflow.keras change the parameter nb_epochs to epochs in the model fit. The imports and basemodel function are:

import numpy
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import load_model


numpy.random.seed(7)

def baseline_model():
    model = Sequential()
    model.add(Dense(num_pixels, input_dim=num_pixels, activation='relu'))
    model.add(Dense(num_classes, activation='softmax'))
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model
Wilmar van Ommeren
  • 7,469
  • 6
  • 34
  • 65
  • 3
    Have you tested it? I see no reasons for that not to work. – maz Mar 08 '17 at 08:10
  • What I see now is that my accuracy drops with about 10 percent after loading the model (only in the first epochs). If reloading works this is of course caused by the new training data. But I just want to ensure that this is indeed the case. – Wilmar van Ommeren Mar 08 '17 at 08:11
  • 7
    Are you saving your model directly with model.save or are you using a model checkpoint (https://keras.io/callbacks/#example-model-checkpoints) ? If you are using model.save would there be a chance that you are saving the latest model (i.e. last epoch) instead of the best one (lowest error)? Can you provide actual code? – maz Mar 08 '17 at 08:22
  • I am saving my latest model, not the best one (untill this point I didn't know that was possible). I will prepare some code – Wilmar van Ommeren Mar 08 '17 at 08:42
  • I added example code. Here the accuracy drops with 3.6 percent and the loss increases from 0.07 to 0.17 between the last epoch of the first dataset and the first epoch of the second dataset. – Wilmar van Ommeren Mar 08 '17 at 09:42
  • 3
    So couldn't you reload that and continue training on the same train data? This should assure you that reloading is ok if the results would be comparable. – Marcin Możejko Mar 08 '17 at 11:08
  • Such a simple solution. Your right @MarcinMożejko. This works out. Thanks! – Wilmar van Ommeren Mar 08 '17 at 11:30
  • What about all the training parameters, such as, for example, learning rate. Are they preserved when you re-start training? – Antonio Sesto Dec 01 '17 at 13:11

8 Answers8

51

Actually - model.save saves all information need for restarting training in your case. The only thing which could be spoiled by reloading model is your optimizer state. To check that - try to save and reload model and train it on training data.

Marcin Możejko
  • 39,542
  • 10
  • 109
  • 120
  • 1
    @Marcin: when using keras `save()`, does it save the best result (lowest loss) of the model or the last result (last update) of the model? thanks – Lion Lai Nov 28 '17 at 03:45
  • 8
    last update. Model checkpoint callback is for saving the best one. – Holi Nov 30 '17 at 07:49
  • 2
    @Khaj Are you referring to this https://keras.io/callbacks/#modelcheckpoint? It seems by default, it saves the last update (not the best one); the best one is only saved if `save_best_only=True` is set explicitly. – flow2k Sep 17 '19 at 18:50
  • question, does model.save saves the learning rate in the case of learning rate scheduling or learning rate decay. I want to do online learning and I don't want each example I get to have the same effect as the training data I used. – Mahmoud Youssef Jan 26 '21 at 09:27
  • 1
    model.save saves the learning_rate (but not the number of epochs) – Robin Davies Sep 26 '22 at 05:34
33

Most of the above answers covered important points. If you are using recent Tensorflow (TF2.1 or above), Then the following example will help you. The model part of the code is from Tensorflow website.

import tensorflow as tf
from tensorflow import keras
mnist = tf.keras.datasets.mnist

(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

def create_model():
  model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(512, activation=tf.nn.relu),  
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10, activation=tf.nn.softmax)
    ])

  model.compile(optimizer='adam', loss='sparse_categorical_crossentropy',metrics=['accuracy'])
  return model

# Create a basic model instance
model=create_model()
model.fit(x_train, y_train, epochs = 10, validation_data = (x_test,y_test),verbose=1)

Please save the model in *.tf format. From my experience, if you have any custom_loss defined, *.h5 format will not save optimizer status and hence will not serve your purpose if you want to retrain the model from where we left.

# saving the model in tensorflow format
model.save('./MyModel_tf',save_format='tf')


# loading the saved model
loaded_model = tf.keras.models.load_model('./MyModel_tf')

# retraining the model
loaded_model.fit(x_train, y_train, epochs = 10, validation_data = (x_test,y_test),verbose=1)

This approach will restart the training where we left before saving the model. As mentioned by others, if you want to save weights of best model or you want to save weights of model every epoch you need to use keras callbacks function (ModelCheckpoint) with options such as save_weights_only=True, save_freq='epoch', and save_best_only.

For more details, please check here and another example here.

Vishnuvardhan Janapati
  • 3,088
  • 1
  • 16
  • 25
  • 1
    nice, this looks very promising - thanks for the info. in this example, it seems to me as though you are retraining the model on the same data that was used for training. if so, i would have thought that the correct approach would be to load a new subset of training data to retrain on (in order to reflect the new information being introduced to the process)? – bibzzzz Apr 06 '20 at 06:34
  • 1
    @bibzzzz Agree with you. Very good comment. I wanted to demonstrate retraining on the same data to improve the performance. The gist clearly shows improvement in the performance where it was stopped before saving the model. I would completely agree with you to retrain on different data and will try it later. Thanks! – Vishnuvardhan Janapati Apr 06 '20 at 07:33
  • 1
    excellent - you have demonstrated this very nicely, thank you. – bibzzzz Apr 06 '20 at 08:25
  • Are you sure about this "Please save the model in *.tf format. From my experience, if you have any custom_loss defined, *.h5 format will not save optimizer status" because it is never mentioned in the Keras docs. https://www.tensorflow.org/guide/keras/save_and_serialize – Wenuka Jul 20 '21 at 16:50
11

The problem might be that you use a different optimizer - or different arguments to your optimizer. I just had the same issue with a custom pretrained model, using

reduce_lr = ReduceLROnPlateau(monitor='loss', factor=lr_reduction_factor,
                              patience=patience, min_lr=min_lr, verbose=1)

for the pretrained model, whereby the original learning rate starts at 0.0003 and during pre-training it is reduced to the min_learning rate, which is 0.000003

I just copied that line over to the script which uses the pre-trained model and got really bad accuracies. Until I noticed that the last learning rate of the pretrained model was the min learning rate, i.e. 0.000003. And if I start with that learning rate, I get exactly the same accuracies to start with as the output of the pretrained model - which makes sense, as starting with a learning rate that is 100 times bigger than the last learning rate used in the pretrained model will result in a huge overshoot of GD and hence in heavily decreased accuracies.

Engineero
  • 12,340
  • 5
  • 53
  • 75
Wolfgang
  • 129
  • 1
  • 6
3

Notice that Keras sometimes has issues with loaded models, as in here. This might explain cases in which you don't start from the same trained accuracy.

shahar_m
  • 3,461
  • 5
  • 41
  • 61
2

You might also be hitting Concept Drift, see Should you retrain a model when new observations are available. There's also the concept of catastrophic forgetting which a bunch of academic papers discuss. Here's one with MNIST Empirical investigation of catastrophic forgetting

Chapin
  • 169
  • 1
  • 7
1

All above helps, you must resume from same learning rate() as the LR when the model and weights were saved. Set it directly on the optimizer.

Note that improvement from there is not guaranteed, because the model may have reached the local minimum, which may be global. There is no point to resume a model in order to search for another local minimum, unless you intent to increase the learning rate in a controlled fashion and nudge the model into a possibly better minimum not far away.

flowgrad
  • 461
  • 4
  • 5
  • Why is that? Can't I use a smaller LR than before? – lte__ Jul 10 '19 at 12:32
  • 1
    Actually, continuing training MAY get you to a better model if you receive more data. So there is a point to resume a model in order to search for another local minimum. – Corey Levinson Oct 31 '19 at 02:44
1

If you are using TF2, use the new saved_model method(format pb). More information available here and here.

model.fit(x=X_train, y=y_train, epochs=10,callbacks=[model_callback])#your first training
tf.saved_model.save(model, save_to_dir_path) #save the model
del model #to delete the model
model =  tf.keras.models.load_model(save_to_dir_path)
model.fit(x=X_train, y=y_train, epochs=10,callbacks=[model_callback])#your second training
vimzie
  • 19
  • 2
-1

It is completely okay to train a model with a saved model. I trained the saved model with the same data and found it was giving good accuracy. Moreover, the time taken was quite less in each epoch.

Here is the code have a look:

from keras.models import load_model
model = load_model('/content/drive/MyDrive/CustomResNet/saved_models/model_1.h5')
history=model.fit(train_gen,validation_data=valid_gen,epochs=5)
  • How does your answer differ from [this one](https://stackoverflow.com/a/65972427/11658924)? – Edward Ji Jun 13 '22 at 06:33
  • The real problem is that neither one of them mentions that the `save` method is an alias for `saved_model.save`. So yes, you're right that they both give essentially the same answer, but you wouldn't be able to tell that without going to the TF docs that vimzie links. – MTKnife Jan 17 '23 at 03:45