I have set up a very simple multi-layer perceptron with a single hidden layer using a sigmoid transfer function, and mock data with 2 inputs.
I have tried to set up using the Simple Feedforward Neural Network using TensorFlow example on Github. I won't post the whole thing here but my cost function is set up like this:
# Backward propagation
loss = tensorflow.losses.mean_squared_error(labels=y, predictions=yhat)
cost = tensorflow.reduce_mean(loss, name='cost')
updates = tensorflow.train.GradientDescentOptimizer(0.01).minimize(cost)
Then I simply loop through a bunch of epochs, the intention being that my weights are optimised via the updates
operation at every step:
with tensorflow.Session() as sess:
init = tensorflow.global_variables_initializer()
sess.run(init)
for epoch in range(10):
# Train with each example
for i in range(len(train_X)):
feed_dict = {X: train_X[i: i + 1], y: train_y[i: i + 1]}
res = sess.run([updates, loss], feed_dict)
print "epoch {}, step {}. w_1: {}, loss: {}".format(epoch, i, w_1.eval(), res[1])
train_result = sess.run(predict, feed_dict={X: train_X, y: train_y})
train_errors = abs((train_y - train_result) / train_y)
train_mean_error = numpy.mean(train_errors, axis=1)
test_result = sess.run(predict, feed_dict={X: test_X, y: test_y})
test_errors = abs((test_y - test_result) / test_y)
test_mean_error = numpy.mean(test_errors, axis=1)
print("Epoch = %d, train error = %.5f%%, test error = %.5f%%"
% (epoch, 100. * train_mean_error[0], 100. * test_mean_error[0]))
sess.close()
I would expect the output of this program to show that at each epoch and for each step the weights would be updated, with a loss
value that would broadly decrease over time.
However, while I see the loss value and errors decreasing, the weights only ever change after the first step, and then remain fixed for the remainder of the program.
What is going on here?
Here is what is printed to screen during the first 2 epochs:
epoch 0, step 0. w_1: [[0. 0.]
[0. 0.]], loss: 492.525634766
epoch 0, step 1. w_1: [[0.5410637 0.5410637]
[0.5803371 0.5803371]], loss: 482.724365234
epoch 0, step 2. w_1: [[0.5410637 0.5410637]
[0.5803371 0.5803371]], loss: 454.100799561
epoch 0, step 3. w_1: [[0.5410637 0.5410637]
[0.5803371 0.5803371]], loss: 418.499267578
epoch 0, step 4. w_1: [[0.5410637 0.5410637]
[0.5803371 0.5803371]], loss: 387.509033203
Epoch = 0, train error = 84.78731%, test error = 88.31780%
epoch 1, step 0. w_1: [[0.5410637 0.5410637]
[0.5803371 0.5803371]], loss: 355.381134033
epoch 1, step 1. w_1: [[0.5410637 0.5410637]
[0.5803371 0.5803371]], loss: 327.519226074
epoch 1, step 2. w_1: [[0.5410637 0.5410637]
[0.5803371 0.5803371]], loss: 301.841705322
epoch 1, step 3. w_1: [[0.5410637 0.5410637]
[0.5803371 0.5803371]], loss: 278.177368164
epoch 1, step 4. w_1: [[0.5410637 0.5410637]
[0.5803371 0.5803371]], loss: 257.852508545
Epoch = 1, train error = 69.24779%, test error = 76.38461%
In addition to not changing, it's also interesting that the weights have the same values for each row. The loss itself keeps decreasing. Here is what the last epoch looks like:
epoch 9, step 0. w_1: [[0.5410637 0.5410637]
[0.5803371 0.5803371]], loss: 13.5048065186
epoch 9, step 1. w_1: [[0.5410637 0.5410637]
[0.5803371 0.5803371]], loss: 12.4460296631
epoch 9, step 2. w_1: [[0.5410637 0.5410637]
[0.5803371 0.5803371]], loss: 11.4702644348
epoch 9, step 3. w_1: [[0.5410637 0.5410637]
[0.5803371 0.5803371]], loss: 10.5709943771
epoch 9, step 4. w_1: [[0.5410637 0.5410637]
[0.5803371 0.5803371]], loss: 10.0332946777
Epoch = 9, train error = 13.49328%, test error = 33.56935%
What am I doing incorrectly here? I know that the weights are being updated somewhere because I can see the training and test errors changing, but why can't I see this?
EDIT: As per squadrick
's request here is the code for w_1
and y_hat
:
# Layer's sizes
x_size = train_X.shape[1] # Number of input nodes
y_size = train_y.shape[1] # Number of outcomes
# Symbols
X = tensorflow.placeholder("float", shape=[None, x_size], name='X')
y = tensorflow.placeholder("float", shape=[None, y_size], name='y')
# Weight initializations
w_1 = tensorflow.Variable(tensorflow.zeros((x_size, x_size)))
w_2 = tensorflow.Variable(tensorflow.zeros((x_size, y_size)))
# Forward propagation
h = tensorflow.nn.sigmoid(tensorflow.matmul(X, w_1))
yhat = tensorflow.matmul(h, w_2)
EDIT2: squadrick
's suggestion to look at w_2
is interesting; when I add w_2
to the print statements with the following;
print "epoch {}, step {}. w_1: {}, w_2: {}, loss: {}".format(epoch, i, w_1.eval(), w_2.eval(), res[1])
I see that it does actually update;
epoch 0, step 0. w_1: [[0. 0.]
[0. 0.]], w_2: [[0.22192918]
[0.22192918]], loss: 492.525634766
epoch 0, step 1. w_1: [[0.5410637 0.5410637]
[0.5803371 0.5803371]], w_2: [[0.44163907]
[0.44163907]], loss: 482.724365234
epoch 0, step 2. w_1: [[0.5410637 0.5410637]
[0.5803371 0.5803371]], w_2: [[0.8678319]
[0.8678319]], loss: 454.100799561
So now it looks like the issue is that only w_2
is being updated, not w_1
. I'm still not sure why this would be happening though.