Tensorflow keras: Problem with loading the weights of the optimizer

Question

I have run the base model to a good accuracy and now i want to load these weights and use them for a model with a few additional layers and later for hyperparameter tuning.

First i construct this new model

  input_tensor = Input(shape=train_generator.image_shape)

  base_model = applications.ResNet152(weights='imagenet', include_top=False, input_tensor=input_tensor)

  for layer in base_model.layers[:]:
    layer.trainable = False 

  x = Flatten()(base_model.output)
  x = Dense(1024, kernel_regularizer=tf.keras.regularizers.L2(l2=0.01), 
          kernel_initializer=tf.keras.initializers.HeNormal(), kernel_constraint=tf.keras.constraints.UnitNorm(axis=0))(x)
  x = LeakyReLU()(x)
  x = BatchNormalization()(x)
  x = Dropout(rate=0.1)(x)
  x = Dense(512, kernel_regularizer=tf.keras.regularizers.L2(l2=0.01), 
          kernel_initializer=tf.keras.initializers.HeNormal(), kernel_constraint=tf.keras.constraints.UnitNorm(axis=0))(x)
  x = LeakyReLU()(x)
  x = BatchNormalization()(x)
  
  predictions = Dense(num_classes, activation= 'softmax')(x)
  model = Model(inputs = base_model.input, outputs = predictions)

Then i compile it because that is necessary at this stage because i have to run the model fit with dummy input before i load the weights. (i think, i have tried to put these code blocks in many different orders to make it work, but i have failed each time)

opt = tfa.optimizers.LazyAdam(lr=0.000074)

model.compile(
    loss='sparse_categorical_crossentropy',
    optimizer=opt,
    metrics=['accuracy']
    )

dummy_input = tf.random.uniform([32, 224, 224, 3]) 
dummy_label = tf.random.uniform([32,]) 
hist = model.fit(dummy_input, dummy_label)

Then i load the weights for the base model:

base_model.load_weights('/content/drive/MyDrive/MODELS_SAVED/model_RESNET152/model_weights2.h5', by_name=True)

Then i load the weights for the optimizer:

import pickle
with open("/content/drive/MyDrive/weight_values2optimizer.pkl", "rb") as f: 
  weights = pickle.load(f) 
opt = model.optimizer.set_weights(weights)

This results in the following error:

ValueError: You called `set_weights(weights)` on optimizer LazyAdam 
with a  weight list of length 1245, 
but the optimizer was expecting 13 weights. 
Provided weights: [63504, array([[[[ 0.00000000e+00, -5.74126025e-04...

Anyone have ideas on how to solve this?

If you have a solution with Adam instead of LazyAdam that is fine too.(i have no idea if that would make a difference)

edit: I have tried many new things last couple of days but nothing is working. Here is the entire code where i stand right now. It includes both the part where i am saving and the part where i am loading.

import tarfile
my_tar2 = tarfile.open('test.tgz')
my_tar2.extractall('test') # specify which folder to extract to
my_tar2.close()

import zipfile
with zipfile.ZipFile("/content/tot_train_bremoved2.zip", 'r') as zip_ref:
    zip_ref.extractall("/content/train/")

import pandas as pd   

train_info = pd.read_csv("/content/drive/MyDrive/train_info.csv")
test_info = pd.read_csv("/content/drive/MyDrive/test_info.csv")
train_folder = "/content/train"
test_folder = "/content/test/test"

import tensorflow as tf
import tensorflow.keras as keras

from keras.layers import Input, Lambda, Dense, Flatten, BatchNormalization, Dropout, PReLU, GlobalAveragePooling2D, LeakyReLU, MaxPooling2D
from keras.models import Model
from tensorflow.keras.applications.resnet_v2 import ResNet152V2, preprocess_input
from keras import applications

from keras.preprocessing import image
from keras.preprocessing.image import ImageDataGenerator
from keras.losses import sparse_categorical_crossentropy

from keras.callbacks import ReduceLROnPlateau, ModelCheckpoint, EarlyStopping, TensorBoard

import tensorflow_addons as tfa

from sklearn.metrics import confusion_matrix
import numpy as np
import matplotlib.pyplot as plt

num_classes = 423
epochs = 20
batch_size = 32
img_height = 224
img_width = 224
IMAGE_SIZE = [img_height, img_width]

_train_generator = ImageDataGenerator(
        rotation_range=180,
        zoom_range=0.2,
        width_shift_range=0.2,
        height_shift_range=0.2,
        shear_range=0.3,
        horizontal_flip=True,
        vertical_flip=True,
        preprocessing_function=preprocess_input)


_val_generator = ImageDataGenerator(
        preprocessing_function=preprocess_input)

train_generator = _train_generator.flow_from_dataframe(dataframe = train_info, 
directory = train_folder, x_col = "filename", 
y_col = "artist", seed = 42,
batch_size = batch_size, shuffle = True, 
class_mode="sparse", target_size = IMAGE_SIZE)

valid_generator = _val_generator.flow_from_dataframe(dataframe = test_info, 
directory = test_folder, x_col = "filename", 
y_col = "artist", seed = 42,
batch_size = batch_size, shuffle = True, 
class_mode="sparse", target_size = IMAGE_SIZE)

def get_uncompiled_model():
   
  input_tensor = Input(shape=train_generator.image_shape)

  base_model = applications.ResNet152(weights='imagenet', include_top=False, input_tensor=input_tensor)

  for layer in base_model.layers[:]:
    layer.trainable = True

  x = Flatten()(base_model.output)
  
  predictions = Dense(num_classes, activation= 'softmax')(x)
  model = Model(inputs = base_model.input, outputs = predictions)

  return model

opt = keras.optimizers.Adam(lr=0.000074)

def get_compiled_model():
    model = get_uncompiled_model()
    model.compile(
    loss='sparse_categorical_crossentropy',
    optimizer=opt,
    metrics=['accuracy']
    )
    return model

earlyStopping = EarlyStopping(monitor='val_loss', patience=5, verbose=0, mode='min')

reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=2, verbose=1, min_delta=1e-4, mode='min')

model = get_compiled_model()

from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True

model.fit(
  train_generator,
  validation_data=valid_generator,
  epochs=epochs,
  verbose = 1,
  steps_per_epoch=len_train // batch_size,
  validation_steps=len_test // batch_size,
  callbacks=[earlyStopping, reduce_lr]
)

import keras.backend as K
import pickle

model.save_weights('/content/drive/MyDrive/MODELS_SAVED/model_RESNET152/model_weights5.h5')
symbolic_weights = getattr(model.optimizer, 'weights')
weight_values = K.batch_get_value(symbolic_weights)
with open('/content/drive/MyDrive/MODELS_SAVED/optimizer3.pkl', 'wb') as f:
    pickle.dump(weight_values, f)

#Here i am building the new model and its from here i am having problems

  input_tensor = Input(shape=train_generator.image_shape)

  base_model = applications.ResNet152(weights='imagenet', include_top=False, input_tensor=input_tensor)

  for layer in base_model.layers[:]:
    layer.trainable = False 

  x = Flatten()(base_model.output)
 
  x = Dense(512, kernel_regularizer=tf.keras.regularizers.L2(l2=0.01), 
          kernel_initializer=tf.keras.initializers.HeNormal(), 
          kernel_constraint=tf.keras.constraints.UnitNorm(axis=0))(x)
  x = LeakyReLU()(x)
  x = BatchNormalization()(x)
  
  predictions = Dense(num_classes, activation= 'softmax')(x)
  model = Model(inputs = base_model.input, outputs = predictions)

model.compile(
    loss='sparse_categorical_crossentropy',
    optimizer='adam',
    metrics=['accuracy']
    )

base_model.load_weights('/content/drive/MyDrive/MODELS_SAVED/model_RESNET152/model_weights5.h5', by_name=True)

with open('/content/drive/MyDrive/MODELS_SAVED/optimizer3.pkl', 'rb') as f:
    weight_values = pickle.load(f)
model.optimizer.set_weights(weight_values)

from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True

epochs = 2

model.fit(
  train_generator,
  validation_data=valid_generator,
  epochs=epochs,
  steps_per_epoch=len_train // batch_size,
  validation_steps=len_test // batch_size,
  verbose = 1,
  callbacks=[earlyStopping, reduce_lr]
)

Now i am getting the following error running this code block (which above in the complete code is right before the model.fit):

with open('/content/drive/MyDrive/MODELS_SAVED/optimizer3.pkl', 'rb') as f:
    weight_values = pickle.load(f)
model.optimizer.set_weights(weight_values)

ValueError: You called `set_weights(weights)` on optimizer Adam with a  weight list of length 1245, but the optimizer was expecting 13 weights. Provided weights: [11907, array([[[[ 0.00000000e+00, -8.27514916e-04...

All i am trying to do is to save the weights for the model and optimizer and then build a new model where i am adding a few layers and loading the weights from the base of the model and the weights from the optimizer.

Where are you getting these `weight_values2optimizer.pkl` weights from? — Akshay Sehgal, Dec 16 '20 at 15:24
@AkshaySehgal Please have a look at the complete code where you see where i save and load the weights. Its just a different file name. In the complete code i am saving and loading '/content/drive/MyDrive/MODELS_SAVED/optimizer3.pkl'. I was doing the exact same thing with weight_values2optimizer.pkl. — JKnecht, Dec 16 '20 at 15:31
I think you should consider that if you change the architecture, it does not make any sense to restore the old optimizer state (the optimizer weights), because first they are not in the same shape (due to the change in architecture, each parameter is related to a optimizer state), and there is no easy way to re-initialize them. You should train this model with a fresh optimizer instance. — Dr. Snoopy, Dec 16 '20 at 21:37
@Dr.Snoopy That makes sense. But in which order should i do things now if i am only loading the weights for the base model? I had this working at one time, it started the new training at a decent accuracy but now i have forgotten how i made that happen, and during the last hours i have tried out many different things but it is always starting from the beginning at accuracy 0. :( — JKnecht, Dec 17 '20 at 13:44
@Dr.Snoopy So i am running the base model first, then i save the weights, then i create the new model with the same base, then i compile it, then i load the weights and then i fit and start training again...unfortunately this is not working, its starting from the beginning again - accuracy 0. What am i doing wrong? — JKnecht, Dec 17 '20 at 13:44

score 4 · Accepted Answer · answered Dec 22 '20 at 01:31

Both models have different architectures so weights of one can't be loaded into another,irrespective of that they inherited same base model. I think it is a simple case of fine-tuning a model (saved model in your case).

What you should do is change the way to create new model, i.e. rather than loading the original resnet model as base model with include_top = False, you should try loading the saved model and implementing your own include_top. This can be done as:

for layer in saved_model.layers[:]:
    layer.trainable = False 

  x = Flatten()(saved_model.layers[-2].output)

Here the key thing is saved_model.layers[-2].output which means output from the second last layer.

Hope it helps, if not please clarify your doubts or let me know what I missed.

Sorry for late reply, i have not had time to test it until today. This worked! Thank god, finally :). (For anyone else running into this problem, you still have to save the weights not the entire model for this to work. Then you need to use "old"model.input like this when you instantiate the new model: new_model = Model(inputs=model.input, outputs = predictions)) — JKnecht, Dec 30 '20 at 17:23

score 0 · Answer 2 · answered Dec 17 '20 at 21:26

Please refer to the following notebook: https://colab.research.google.com/drive/1j_zLqG1zUMi6UYPdc6gtmkJvHuawL4Sk?usp=sharing

{save,load}_weights on a model includes the weights of the optimizer. That would be the preferred way to initialise the optimizer weights.
You can copy the optimizer from one model to another.
The reason you are getting the error above is that the optimizer doesn't allocate its weights until training starts; If you really want to do it manually, just trigger model.fit() for 1 epoch 1 datapoint and then load the data manually.

You can replace

base_model.load_weights('/content/drive/MyDrive/MODELS_SAVED/model_RESNET152/model_weights5.h5', by_name=True)

with open('/content/drive/MyDrive/MODELS_SAVED/optimizer3.pkl', 'rb') as f:
    weight_values = pickle.load(f)
model.optimizer.set_weights(weight_values)

with:

base_model.load_weights('/content/drive/MyDrive/MODELS_SAVED/model_RESNET152/model_weights5.h5', by_name=True)

model.optimizer = base_model.optimizer

But you are using the same model when you load the weights. That is not the case for me and the reason i am getting the error. I am loading the weights to a model where i a have added a couple of layers. — JKnecht, Dec 18 '20 at 07:48

score 0 · Answer 3 · answered Dec 23 '20 at 05:38

After saving the first model's weights with model.save_weights('name.h5'), you should build a second model, exactly like the first one, let's call it model2. Then load the weights you saved before into it. The code should be model2.load_weights('name.h5'). See model.summary() to see the names and number of the first model's layers. For each layer, you need to define a variable and add those weights (and also biases) to that variable with a method called get_weights() . Here is an example:

x1 = model2.layers[1].get_weights()

Here, I put the weights and biases of the first layer (which in mine was a convolution layer) in the variable x1.

x1[0] is a list of the weights of the layer #1.

x1[1] is a list of the biases of the layer #1.

Tensorflow keras: Problem with loading the weights of the optimizer

3 Answers3