How to apply gradient descent with learning rate decay and update rule simultaneously?

Question

I'm doing an experiment related to CNN.

What I want to implement is the gradient descent with learning rate decay and the update rule from AlexNet.

The algorithm that I want to implements is below (captured picture from alexnet paper):

I think I did learning rate decay correctly and the code is below (I checked learning rate decay according to global_step correctly):

learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step,
100000, 0.1, staircase=True)

Next, I should implement the update rule ( weight decay of 0.005 & momentum of 0.9 ) I think I did the momentum correctly but could not find a way to implement weight decay, the code is also below:

cross_entropy = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits = fc8))
train_step = tf.train.MomentumOptimizer(learning_rate, 0.9).minimize(cross_entropy,global_step=global_step)

Am I doing correctly "learning rate decay" and "momentum"? and How can I implement "weight decay of 0.005" correctly?

I used tf.layers.conv2d as a convolutional layer so that weights and biases are included in there. The code is below:

conv5 = tf.layers.conv2d(
  inputs=conv4,
  filters=256,
  strides=1,
  kernel_size=[3, 3],
  kernel_initializer= tf.constant_initializer(pre_trained_model["conv5"][0]),
  bias_initializer = tf.constant_initializer(pre_trained_model["conv5"][1]),
  padding="SAME",
  activation=tf.nn.relu,name='conv5')

Please have a look at https://stackoverflow.com/a/36573850/6824418 — Allen Lavoie, May 23 '17 at 23:59
In that case, you're probably better off with [l2_regularizer](https://www.tensorflow.org/api_docs/python/tf/contrib/layers/l2_regularizer), although you could just get the weight variable that the layer creates. — Allen Lavoie, May 24 '17 at 17:56
@AllenLavoie I will study l2_regularizer that you suggested, thank you. but for the way of getting weight varaibles from the layers , how can I get variables? Because I did initialize the weights/biases with initializer in each layer not by tf.Variable(..., name='weights'), I couldn't find the way to get variables in the layers. — LKM, May 24 '17 at 18:02
I'd wrap it in a variable_scope then [retrieve the trainable variables from that](https://stackoverflow.com/questions/36533723/tensorflow-get-all-variables-in-scope). Presumably one will have 'weight' in the name and the other will have 'bias' in the name. — Allen Lavoie, May 24 '17 at 18:05

How to apply gradient descent with learning rate decay and update rule simultaneously?

0 Answers0