Tensorflow: parameters do not update when tuning network

Question

I want to implement my project by two steps: 1. training the network using some data; 2. turning the trained network using some other data.

For the first step (training the network), I have got a not bad result. But, for the second step (turning the network), a problem happen: the parameter do not update. More details is given:

My loss includes two things: 1. the normal cost for my project. 2. the L2 regularization item. Giving as follow:

c1 = y_conv - y_
c2 = tf.square(c1)
c3 = tf.reduce_sum(c2,1)
c4 = tf.sqrt(c3)
cost = tf.reduce_mean(c4)
regular = 0.0001*( tf.nn.l2_loss(w_conv1) + tf.nn.l2_loss(b_conv1) +\
              tf.nn.l2_loss(w_conv2) + tf.nn.l2_loss(b_conv2) +\
              tf.nn.l2_loss(w_conv3) + tf.nn.l2_loss(b_conv3) +\
              tf.nn.l2_loss(w_conv4) + tf.nn.l2_loss(b_conv4) +\
              tf.nn.l2_loss(w_fc1)   + tf.nn.l2_loss(b_fc1) +\
              tf.nn.l2_loss(w_fc2)   + tf.nn.l2_loss(b_fc2) )
loss = regular + cost

When tuning the network, I print the loss, cost and L2 item:

Epoch:     1 || loss = 0.184248179 || cost = 0.181599200 || regular = 0.002648979
Epoch:     2 || loss = 0.184086733 || cost = 0.181437753 || regular = 0.002648979
Epoch:     3 || loss = 0.184602532 || cost = 0.181953552 || regular = 0.002648979
Epoch:     4 || loss = 0.184308948 || cost = 0.181659969 || regular = 0.002648979
Epoch:     5 || loss = 0.184251788 || cost = 0.181602808 || regular = 0.002648979
Epoch:     6 || loss = 0.184105504 || cost = 0.181456525 || regular = 0.002648979
Epoch:     7 || loss = 0.184241678 || cost = 0.181592699 || regular = 0.002648979
Epoch:     8 || loss = 0.184189570 || cost = 0.181540590 || regular = 0.002648979
Epoch:     9 || loss = 0.184390061 || cost = 0.181741081 || regular = 0.002648979
Epoch:    10 || loss = 0.184064055 || cost = 0.181415075 || regular = 0.002648979
Epoch:    11 || loss = 0.184323867 || cost = 0.181674888 || regular = 0.002648979
Epoch:    12 || loss = 0.184519534 || cost = 0.181870555 || regular = 0.002648979
Epoch:    13 || loss = 0.183869445 || cost = 0.181220466 || regular = 0.002648979
Epoch:    14 || loss = 0.184313927 || cost = 0.181664948 || regular = 0.002648979
Epoch:    15 || loss = 0.184198738 || cost = 0.181549759 || regular = 0.002648979

As we can see, the L2 item do not update, but the cost and loss update. In order to check whether the parameters of network update, I calculate the value:

gs, lr, solver, l, c, r, pY, bconv1 = sess.run([global_step, learning_rate, train, loss, cost, regular, y_conv, b_conv1], feed_dict={x: batch_X, y_: batch_Y, keep_prob:0.5})

So the bconv1 is one part parameters, and I am confirm that the bconv1 do not update between two epoch. I am very confused that why the cost/loss update, while the parameters of network do not update.

And the whole code except the CNN layers is:

c1 = y_conv - y_
c2 = tf.square(c1)
c3 = tf.reduce_sum(c2,1)
c4 = tf.sqrt(c3)
cost = tf.reduce_mean(c4)

regular = 0.0001*( tf.nn.l2_loss(w_conv1) + tf.nn.l2_loss(b_conv1) +\
              tf.nn.l2_loss(w_conv2) + tf.nn.l2_loss(b_conv2) +\
              tf.nn.l2_loss(w_conv3) + tf.nn.l2_loss(b_conv3) +\
              tf.nn.l2_loss(w_conv4) + tf.nn.l2_loss(b_conv4) +\
              tf.nn.l2_loss(w_fc1)   + tf.nn.l2_loss(b_fc1) +\
              tf.nn.l2_loss(w_fc2)   + tf.nn.l2_loss(b_fc2) )
loss = regular + cost
global_step = tf.Variable(0, trainable=False)
initial_learning_rate = 0.001 

learning_rate = tf.train.exponential_decay(initial_learning_rate,
                                           global_step=global_step,
                                           decay_steps=int( X.shape[0]/1000 ),decay_rate=0.99, staircase=True)

train = tf.train.AdamOptimizer(learning_rate).minimize(loss,global_step=global_step)
batch_size = 1000
init = tf.initialize_all_variables()
saver = tf.train.Saver()
sess = tf.Session()
sess.run(init)
saver.restore(sess,'../TrainingData/convParameters.ckpt')
total_batch = int( X.shape[0]/batch_size )
for epoch in range(1000):
    for i in range(total_batch):
        batch_X = X[i*batch_size:(i+1)*batch_size]
        batch_Y = Y[i*batch_size:(i+1)*batch_size]
        gs, lr, solver, l, c, r, pY, bconv1 = sess.run([global_step, learning_rate, train, loss, cost, regular, y_conv, b_conv1], feed_dict={x: batch_X, y_: batch_Y, keep_prob:0.5})

    print("Epoch: %5d || loss = %.9f || cost = %.9f || regular = %.9f"%(epoch+1,L/total_batch,Mcost/total_batch,Reg/total_batch))

Any suggestion is important for me. Thank you in advance.

zhang qiang

Qiang Zhang · Answer 1 · 2017-06-25T03:51:18.873

Actually, I thought I figure out this problem, but I am not. I just know what result in this bug. The reason why the parameter do not update is that the global_step is very large after the pre-training so that the learning rate is very small (about 1e-24). So, what I should do is to set the global_step to 0 after restore the network parameters. Also, the learning rate should also be setted agin.

The code should look like:

saver.restore(sess,'../TrainingData/convParameters.ckpt')
global_step = tf.Variable(0, trainable=False) 
learning_rate = tf.train.exponential_decay(initial_learning_rate,
                                           global_step=global_step,
                                           decay_steps=int( X.shape[0]/1000 ),decay_rate=0.99, staircase=True)

Then, you can fetch the value of global_step and learning rate to check whether it is ok:

gafter,lrafter = sess.run([global_step,learning_rate])

It must be done after restore the network parameters.

I though I solved this bug by the above code. However, the global_step do not update when training.

what I have done are:

Reset the optimizer, just like:

global_step = tf.Variable(0, trainable=False) learning_rate = tf.train.exponential_decay(initial_learning_rate, global_step=global_step, decay_steps=int( X.shape[0]/1000 ),decay_rate=0.99, staircase=True) train = tf.train.AdamOptimizer(learning_rate).minimize(loss,global_step=global_step) global_step_init = tf.initialize_variables([global_step]) sess.run(global_step_init) But I was told I am using the uninitialized variable.
Initial the optimizer:

global_step_init = tf.initialize_variables([global_step, train])

I was told that the train can not be initialized.

I am so exhausted. Finally, I give up. I just set the learning rate as a placeholder, just like: enter link description here

If some body have the solution, please tell me. Thanks a lot.

Tensorflow: parameters do not update when tuning network

1 Answers1