I am trying to implement the SGD functionality to update weights in python manually in caffe python instead of using solver.step()
function. The goal is to match the weight updates after doing solver.step()
and that by manually updating the weights.
The setup is as follows:
Use MNIST data. Set the random seed in solver.prototxt as: random_seed: 52
. Make sure momentum: 0.0
and, base_lr: 0.01
, lr_policy: "fixed"
. Above is done so that, I can simply implement the SGD update equation (with out momentum, regularization etc.). The equation is simply:
W_t+1 = W_t - mu * W_t_diff
Following are the two tests:
Test1: Using caffe's forward() and backward() to calculate the forward propagation and backward propagation. For each layer that contain weights I do:
for k in weight_layer_idx:
solver.net.layers[k].blobs[0].diff[...] *= lr # weights
solver.net.layers[k].blobs[1].diff[...] *= lr # biases
Next, update the weight/biases as:
solver.net.layers[k].blobs[0].data[...] -= solver.net.layers[k].blobs[0].diff
solver.net.layers[k].blobs[1].data[...] -= solver.net.layers[k].blobs[1].diff
I run this for 5 iterations.
Test2: Run caffe's solver.step(5)
.
Now, what I expect is the two tests should yield exactly same weights after the two iterations.
I save the weights values after each of the above tests and calculate the norm difference between the weight vectors by the two tests, and I see that they are not bit-exact. Can some one spot something that I might be missing?
Following is the entire code for reference:
import caffe
caffe.set_device(0)
caffe.set_mode_gpu()
import numpy as np
niter = 5
solver = None
solver = caffe.SGDSolver('solver.prototxt')
# Automatic SGD: TEST2
solver.step(niter)
# save the weights to compare later
w_solver_step = copy(solver.net.layers[1].blobs[0].data.astype('float64'))
b_solver_step = copy(solver.net.layers[1].blobs[1].data.astype('float64'))
# Manual SGD: TEST1
solver = None
solver = caffe.SGDSolver('solver.prototxt')
lr = 0.01
momentum = 0.
# Get layer types
layer_types = []
for ll in solver.net.layers:
layer_types.append(ll.type)
# Get the indices of layers that have weights in them
weight_layer_idx = [idx for idx,l in enumerate(layer_types) if 'Convolution' in l or 'InnerProduct' in l]
for it in range(1, niter+1):
solver.net.forward() # fprop
solver.net.backward() # bprop
for k in weight_layer_idx:
solver.net.layers[k].blobs[0].diff[...] *= lr
solver.net.layers[k].blobs[1].diff[...] *= lr
solver.net.layers[k].blobs[0].data[...] -= solver.net.layers[k].blobs[0].diff
solver.net.layers[k].blobs[1].data[...] -= solver.net.layers[k].blobs[1].diff
# save the weights to compare later
w_fwdbwd_update = copy(solver.net.layers[1].blobs[0].data.astype('float64'))
b_fwdbwd_update = copy(solver.net.layers[1].blobs[1].data.astype('float64'))
# Compare
print "after iter", niter, ": weight diff: ", np.linalg.norm(w_solver_step - w_fwdbwd_update), "and bias diff:", np.linalg.norm(b_solver_step - b_fwdbwd_update)
The last line that compares the weights with the two tests produces:
after iter 5 : weight diff: 0.000203027766144 and bias diff: 1.78390789051e-05
where as I expect this difference to be 0.0
Any ideas?