restore checkpoint in order to retrain new class

Question

I have a checkpoint which is trained with 11 classes. I added one class to my dataset and trying to restore it in order to retain the CNN but it gave me an error related to shape because the previous one was trained with 11 classes and actually have 12 classes, did i saved the weights and biases variable in a right way ? what should I do ? here is the code:

batch_size = 10
num_hidden = 64
num_channels = 1
depth = 32
....
graph = tf.Graph()

with graph.as_default():

# Input data.
  tf_train_dataset = tf.placeholder(
  tf.float32, shape=(batch_size, IMAGE_SIZE_H, IMAGE_SIZE_W, num_channels))
  tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
  tf_valid_dataset = tf.constant(valid_dataset)
  tf_test_dataset = tf.constant(test_dataset)

  w_b = {
     'weight_0': tf.Variable(tf.random_normal([patch_size_1, patch_size_1, num_channels, depth],stddev=0.1)),
     'weight_1': tf.Variable(tf.random_normal([patch_size_2, patch_size_2, depth, depth], stddev=0.1)),
     'weight_2': tf.Variable(tf.random_normal([patch_size_3, patch_size_3, depth, depth], stddev=0.1)),
     'weight_3': tf.Variable(tf.random_normal([IMAGE_SIZE_H // 32 * IMAGE_SIZE_W // 32 * depth, num_hidden], stddev=0.1)),
     'weight_4': tf.Variable(tf.random_normal([num_hidden, num_labels], stddev=0.1)),

     'bias_0' : tf.Variable(tf.zeros([depth])), 
     'bias_1' : tf.Variable(tf.constant(1.0, shape=[depth])),
     'bias_2' : tf.Variable(tf.constant(1.0, shape=[depth])),
     'bias_3' : tf.Variable(tf.constant(1.0, shape=[num_hidden])),
     'bias_4' : tf.Variable(tf.constant(1.0, shape=[num_labels]))
        }

 # Model.
   def model(data):

      conv_1 = tf.nn.conv2d(data, w_b['weight_0'] , [1, 2, 2, 1], padding='SAME')  
      hidden_1 = tf.nn.relu(conv_1 + w_b['bias_0'])   
      pool_1 = tf.nn.max_pool(hidden_1,ksize = [1,5,5,1], strides= [1,2,2,1],padding ='SAME' )
      conv_2 = tf.nn.conv2d(pool_1, w_b['weight_1'], [1, 2, 2, 1], padding='SAME')   
      hidden_2 = tf.nn.relu(conv_2 + w_b['bias_1'])
      conv_3 = tf.nn.conv2d(hidden_2, w_b['weight_2'], [1, 2, 2, 1], padding='SAME')
      hidden_3 = tf.nn.relu(conv_3 + w_b['bias_2'])
      pool_2 = tf.nn.max_pool(hidden_3,ksize = [1,3,3,1], strides= [1,2,2,1],padding ='SAME' )
      shape = pool_2.get_shape().as_list()
      reshape = tf.reshape(pool_2, [shape[0], shape[1] * shape[2] * shape[3]]) 
      hidden_4 = tf.nn.relu(tf.matmul(reshape, w_b['weight_3']) + w_b['bias_3'])

      return tf.matmul(hidden_4, w_b['weight_4']) + w_b['bias_4']


  # Training computation.
  logits = model(tf_train_dataset)

  loss = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels))


  optimizer = tf.train.GradientDescentOptimizer(0.01).minimize(loss)



  train_prediction = tf.nn.softmax(logits)
  valid_prediction = tf.nn.softmax(model(tf_valid_dataset))
  test_prediction = tf.nn.softmax(model(tf_test_dataset))

  init = tf.initialize_all_variables()
  w_b_saver = tf.train.Saver(var_list = w_b)

num_steps = 1001


with tf.Session(graph=graph) as sess:
 ckpt = ("/home/..../w_b_models.ckpt")
 if os.path.isfile(ckpt) :

    w_b_saver.restore(sess,ckpt)
    print("restore complete")
    print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval() , test_labels))
 else:
    print("Error while loading model checkpoint.")

    print('Initialized')
    sess.run(init)

    for step in range(num_steps):
      .....

    accuracy(test_prediction.eval(),test_labels, force = False ))   
    save_path_w_b = w_b_saver.save(sess, "/home/...../w_b_models.ckpt")
    print("Model saved in file: %s" % save_path_w_b)

and here is the error :

 InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [64,12] rhs shape= [64,11]
 [[Node: save/Assign_9 = Assign[T=DT_FLOAT, _class=["loc:@Variable_4"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Variable_4, save/restore_slice_9/_12)]]

Steven · Accepted Answer · 2016-09-24T17:55:18.167

1

I believe the problem is you need to remove this one from w_b then save it then restore it as you're doing.

Remove this:

'weight_4': tf.Variable(tf.random_normal([num_hidden, num_labels], stddev=0.1)),

Then it should work. Main reason is that you're changing number of labels and expecting it to restore to this same variable. As a side note it's better to use tf.get_variable instead of tf.Variable.

Updated answer:

make a new variable called

w_b_to_save = {
 'weight_0': tf.Variable(tf.random_normal([patch_size_1, patch_size_1, num_channels, depth],stddev=0.1)),
 'weight_1': tf.Variable(tf.random_normal([patch_size_2, patch_size_2, depth, depth], stddev=0.1)),
 'weight_2': tf.Variable(tf.random_normal([patch_size_3, patch_size_3, depth, depth], stddev=0.1)),
 'weight_3': tf.Variable(tf.random_normal([IMAGE_SIZE_H // 32 * IMAGE_SIZE_W // 32 * depth, num_hidden], stddev=0.1)),


 'bias_0' : tf.Variable(tf.zeros([depth])), 
 'bias_1' : tf.Variable(tf.constant(1.0, shape=[depth])),
 'bias_2' : tf.Variable(tf.constant(1.0, shape=[depth])),
 'bias_3' : tf.Variable(tf.constant(1.0, shape=[num_hidden])),
    }

...

w_b_saver = tf.train.Saver(var_list = w_b_to_save)

now you'll be able to save just the ones you want. This is a bit excessive to create a new variable that's basically the same as the last one but it's to show the point that you can't both save the last layer, and restore it while changing it.

edited Sep 24 '16 at 17:55

answered Sep 23 '16 at 19:56

Steven

5,134
2
27
38

and what about the return in the function , i am using 'weight_4' and bias 'bias_4' which includes a linear layer (matrix multiplication by a weight matrix followed by adding a vector of biases) what should I write instead of them if i delete the 'weight_4': tf.Variable(tf.random_normal([num_hidden, num_labels], stddev=0.1)), – peter Sep 23 '16 at 21:20
You could use another variable w_b_not_restored = {'weight_4': tf.get_variable(tf.random_normal([num_hidden, num_labels], stddev=0.1)), 'bias_4' : tf.get_variable(tf.constant(1.0, shape=[num_labels]))} – Steven Sep 24 '16 at 17:49
yes i have splited the variables one with w_b_to_save which contains the list oh weights and biases, and the other variable w_b_not_restored which contain just the weight_4 and bias_4. its restored but how can i retrain the new class, should i assign some variable ? – peter Sep 24 '16 at 18:56
There are several things you can do. 1) freeze the previous weights and just train the new classifier. 2) You can try to initilialize the weights from some other model by passing in the values as a numpy array. 3) Just train the whole thing. Maybe you can clarify what you mean by how to assign it some value? You can randomly initialize it like you do for the others also you can change the saver for the next time you run it if you actually want to save all of your weights. tf.train.Saver(var_list=all_of_the_weights_and_biases) – Steven Sep 24 '16 at 19:32
i mean by assign some variables like here in this link (http://stackoverflow.com/a/33662680/5548115) , i am beginner in tensorflow can you plz tell me more details in my code about the first option how could i freeze the previous weights – peter Sep 24 '16 at 20:22
Here's how to freeze weights. http://stackoverflow.com/questions/35298326/freeze-some-variables-scopes-in-tensorflow-stop-gradient-vs-passing-variables – Steven Sep 25 '16 at 02:06
I am limited by characters in terms of what I can say in the comments and this is a new question. Anyways you can create a placehold with predefined weights if you already know them my_placeholder = tf.placeholder() then a var, variable my_variable = tf.get_variable(blah_initialization) , my_variable.assign_op(my_placeholder). So yeah that's one way to initialize if you already know what the weights should be. Feel free to start up another question or look here for more help http://stackoverflow.com/documentation/tensorflow/topics – Steven Sep 25 '16 at 02:11
If you're question has been answered please mark it as done :) – Steven Sep 26 '16 at 14:41

restore checkpoint in order to retrain new class

1 Answers1