3

I'm training a modified InceptionV3 model with the multi_gpu_model in Keras, and I use model.save to save the whole model.

Then I closed and restarted the IDE and used load_model to reinstantiate the model.

The problem is that I am not able to resume the training exactly where I left off.

Here is the code:

parallel_model = multi_gpu_model(model, gpus=2)

parallel_model.compile(optimizer='rmsprop', loss='categorical_crossentropy')

history = parallel_model.fit_generator(generate_batches(path), steps_per_epoch = num_images/batch_size, epochs = num_epochs)

model.save('my_model.h5')

Before the IDE closed, the loss is around 0.8.

After restarting the IDE, reloading the model and re-running the above code, the loss became 1.5.

But, according to the Keras FAQ, model_save should save the whole model (architecture + weights + optimizer state), and load_model should return a compiled model that is identical to the previous one.

So I don't understand why the loss becomes larger after resuming the training.

EDIT: If I don't use the multi_gpu_model and just use the ordinary model, I'm able to resume exactly where I left off.

chaohuang
  • 3,965
  • 4
  • 27
  • 35

2 Answers2

1

When you call multi_gpu_model(...), Keras automatically sets the weights of your model to some default values (at least in the version 2.2.0 which I am currently using). That's why you were not able to resume the training at the same point as it was when you saved it.

I just solved the issue by replacing the weights of the parallel model with the weights from the sequential model:

parallel_model = multi_gpu_model(model, gpus=2)

parallel_model.layers[-2].set_weights(model.get_weights()) # you can check the index of the sequential model with parallel_model.summary()

parallel_model.compile(optimizer='rmsprop', loss='categorical_crossentropy')

history = parallel_model.fit_generator(generate_batches(path), steps_per_epoch = num_images/batch_size, epochs = num_epochs)

I hope this will help you.

saul19am
  • 11
  • 1
0

@saul19am When you compile it, you can only load the weights and the model structure, but you still lose the optimizer_state. I think this can help.

  • Your answer looks like a comment to me. Please do not answer with a comment. Understandably, your rep is too low to comment, but that still does not mean answers should be used to make comments as an alternative. – Dang Nguyen Jan 04 '19 at 08:25