I'm training a modified InceptionV3 model with the multi_gpu_model
in Keras, and I use model.save
to save the whole model.
Then I closed and restarted the IDE and used load_model
to reinstantiate the model.
The problem is that I am not able to resume the training exactly where I left off.
Here is the code:
parallel_model = multi_gpu_model(model, gpus=2)
parallel_model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
history = parallel_model.fit_generator(generate_batches(path), steps_per_epoch = num_images/batch_size, epochs = num_epochs)
model.save('my_model.h5')
Before the IDE closed, the loss is around 0.8.
After restarting the IDE, reloading the model and re-running the above code, the loss became 1.5.
But, according to the Keras FAQ, model_save
should save the whole model (architecture + weights + optimizer state), and load_model
should return a compiled model that is identical to the previous one.
So I don't understand why the loss becomes larger after resuming the training.
EDIT: If I don't use the multi_gpu_model
and just use the ordinary model, I'm able to resume exactly where I left off.