I want to implement my project by two steps: 1. training the network using some data; 2. turning the trained network using some other data.
For the first step (training the network), I have got a not bad result. But, for the second step (turning the network), a problem happen: the parameter do not update. More details is given:
My loss includes two things: 1. the normal cost for my project. 2. the L2 regularization item. Giving as follow:
c1 = y_conv - y_
c2 = tf.square(c1)
c3 = tf.reduce_sum(c2,1)
c4 = tf.sqrt(c3)
cost = tf.reduce_mean(c4)
regular = 0.0001*( tf.nn.l2_loss(w_conv1) + tf.nn.l2_loss(b_conv1) +\
tf.nn.l2_loss(w_conv2) + tf.nn.l2_loss(b_conv2) +\
tf.nn.l2_loss(w_conv3) + tf.nn.l2_loss(b_conv3) +\
tf.nn.l2_loss(w_conv4) + tf.nn.l2_loss(b_conv4) +\
tf.nn.l2_loss(w_fc1) + tf.nn.l2_loss(b_fc1) +\
tf.nn.l2_loss(w_fc2) + tf.nn.l2_loss(b_fc2) )
loss = regular + cost
When tuning the network, I print the loss, cost and L2 item:
Epoch: 1 || loss = 0.184248179 || cost = 0.181599200 || regular = 0.002648979
Epoch: 2 || loss = 0.184086733 || cost = 0.181437753 || regular = 0.002648979
Epoch: 3 || loss = 0.184602532 || cost = 0.181953552 || regular = 0.002648979
Epoch: 4 || loss = 0.184308948 || cost = 0.181659969 || regular = 0.002648979
Epoch: 5 || loss = 0.184251788 || cost = 0.181602808 || regular = 0.002648979
Epoch: 6 || loss = 0.184105504 || cost = 0.181456525 || regular = 0.002648979
Epoch: 7 || loss = 0.184241678 || cost = 0.181592699 || regular = 0.002648979
Epoch: 8 || loss = 0.184189570 || cost = 0.181540590 || regular = 0.002648979
Epoch: 9 || loss = 0.184390061 || cost = 0.181741081 || regular = 0.002648979
Epoch: 10 || loss = 0.184064055 || cost = 0.181415075 || regular = 0.002648979
Epoch: 11 || loss = 0.184323867 || cost = 0.181674888 || regular = 0.002648979
Epoch: 12 || loss = 0.184519534 || cost = 0.181870555 || regular = 0.002648979
Epoch: 13 || loss = 0.183869445 || cost = 0.181220466 || regular = 0.002648979
Epoch: 14 || loss = 0.184313927 || cost = 0.181664948 || regular = 0.002648979
Epoch: 15 || loss = 0.184198738 || cost = 0.181549759 || regular = 0.002648979
As we can see, the L2 item do not update, but the cost and loss update. In order to check whether the parameters of network update, I calculate the value:
gs, lr, solver, l, c, r, pY, bconv1 = sess.run([global_step, learning_rate, train, loss, cost, regular, y_conv, b_conv1], feed_dict={x: batch_X, y_: batch_Y, keep_prob:0.5})
So the bconv1 is one part parameters, and I am confirm that the bconv1 do not update between two epoch. I am very confused that why the cost/loss update, while the parameters of network do not update.
And the whole code except the CNN layers is:
c1 = y_conv - y_
c2 = tf.square(c1)
c3 = tf.reduce_sum(c2,1)
c4 = tf.sqrt(c3)
cost = tf.reduce_mean(c4)
regular = 0.0001*( tf.nn.l2_loss(w_conv1) + tf.nn.l2_loss(b_conv1) +\
tf.nn.l2_loss(w_conv2) + tf.nn.l2_loss(b_conv2) +\
tf.nn.l2_loss(w_conv3) + tf.nn.l2_loss(b_conv3) +\
tf.nn.l2_loss(w_conv4) + tf.nn.l2_loss(b_conv4) +\
tf.nn.l2_loss(w_fc1) + tf.nn.l2_loss(b_fc1) +\
tf.nn.l2_loss(w_fc2) + tf.nn.l2_loss(b_fc2) )
loss = regular + cost
global_step = tf.Variable(0, trainable=False)
initial_learning_rate = 0.001
learning_rate = tf.train.exponential_decay(initial_learning_rate,
global_step=global_step,
decay_steps=int( X.shape[0]/1000 ),decay_rate=0.99, staircase=True)
train = tf.train.AdamOptimizer(learning_rate).minimize(loss,global_step=global_step)
batch_size = 1000
init = tf.initialize_all_variables()
saver = tf.train.Saver()
sess = tf.Session()
sess.run(init)
saver.restore(sess,'../TrainingData/convParameters.ckpt')
total_batch = int( X.shape[0]/batch_size )
for epoch in range(1000):
for i in range(total_batch):
batch_X = X[i*batch_size:(i+1)*batch_size]
batch_Y = Y[i*batch_size:(i+1)*batch_size]
gs, lr, solver, l, c, r, pY, bconv1 = sess.run([global_step, learning_rate, train, loss, cost, regular, y_conv, b_conv1], feed_dict={x: batch_X, y_: batch_Y, keep_prob:0.5})
print("Epoch: %5d || loss = %.9f || cost = %.9f || regular = %.9f"%(epoch+1,L/total_batch,Mcost/total_batch,Reg/total_batch))
Any suggestion is important for me. Thank you in advance.
zhang qiang