2 Layer Neural Network Does not Converge

Question

Background

I am a newbie to TensorFlow and I am trying to understand the basics of deep learning. I started from writing a two-layer neural network from scratch and it achieved 89% accuracy on MNIST dataset and now I am trying to implement the same network in TensorFlow and compare their performance.

Problem

I am not sure if I miss something basic in the code, but the following implementation seems to be unable to update weights and therefore could not output anything meaningful.

num_hidden = 100
# x -> (batch_size, 784)
x = tf.placeholder(tf.float32, [None, 784])

W1 = tf.Variable(tf.zeros((784, num_hidden)))
b1 = tf.Variable(tf.zeros((1, num_hidden)))
W2 = tf.Variable(tf.zeros((num_hidden, 10)))
b2 = tf.Variable(tf.zeros((1, 10)))
# z -> (batch_size, num_hidden)
z = tf.nn.relu(tf.matmul(x, W1) + b1)
# y -> (batch_size, 10)
y = tf.nn.softmax(tf.matmul(z, W2) + b2)

# y_ -> (batch_size, 10)
y_ =  tf.placeholder(tf.float32, [None, 10])
# y_ * tf.log(y) -> (batch_size, 10)
cross_entropy =  -tf.reduce_sum(y_ * tf.log(y+1e-10))
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(cross_entropy)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
# tf.argmax(y, axis=1) returns the maximum index in each row
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
for epoch in range(1000):
    # batch_xs -> (100, 784)
    # batch_ys -> (100, 10), one-hot encoded
    batch_xs, batch_ys = mnist.train.next_batch(100)
    train_data = {x: batch_xs, y_: batch_ys}
    sess.run(train_step, feed_dict=train_data)
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))
W1_e, b1_e, W2_e, b2_e = W1.eval(), b1.eval(), W2.eval(), b2.eval()
sess.close()

What I Have Done

I checked many the official docs and many other implementations, but I feel totally confused since they may use different versions and API varies greatly.

So could someone help me, thank you in advance.

score 0 · Accepted Answer · answered Nov 09 '18 at 23:05

There are two problems with what you have done so far. First, you have initialised all of the weights to zero, which will prevent the network from learning. And secondly, the learning rate was too high. The below code got me 0.9665 accuracy. For why not to set all the weights to zero you can see here .

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)


num_hidden = 100

# x -> (batch_size, 784)
x = tf.placeholder(tf.float32, [None, 784])
label_place = tf.placeholder(tf.float32, [None, 10])


# WONT WORK as EVERYTHING IS ZERO!
# # Get accuracy at chance \approx 0.1
# W1 = tf.Variable(tf.zeros((784, num_hidden)))
# b1 = tf.Variable(tf.zeros((1, num_hidden)))
# W2 = tf.Variable(tf.zeros((num_hidden, 10)))
# b2 = tf.Variable(tf.zeros((1, 10)))

# Will work, you will need to train a bit more than 1000 steps
# though
W1 = tf.Variable(tf.random_normal((784, num_hidden), 0., 0.1))
b1 = tf.Variable(tf.zeros((1, num_hidden)))
W2 = tf.Variable(tf.random_normal((num_hidden, 10), 0, 0.1))
b2 = tf.Variable(tf.zeros((1, 10)))

# network, we only go as far as the linear output after the hidden layer
# so we can feed it into the tf.nn.softmax_cross_entropy_with_logits below
# this is more numerically stable
z = tf.nn.relu(tf.matmul(x, W1) + b1)
logits = tf.matmul(z, W2) + b2

# define our loss etc as before. however note that the learning rate is lower as
# with a higher learning rate it wasnt really working
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(labels=label_place, logits=logits)
train_step = tf.train.GradientDescentOptimizer(.001).minimize(cross_entropy)

# continue as before
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
correct_prediction = tf.equal(tf.argmax(tf.nn.softmax(logits), 1), tf.argmax(label_place, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
for epoch in range(5000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    train_data = {x: batch_xs, label_place: batch_ys}
    sess.run(train_step, feed_dict=train_data)
print(sess.run(accuracy, feed_dict={x: mnist.test.images, label_place: mnist.test.labels}))
W1_e, b1_e, W2_e, b2_e = W1.eval(), b1.eval(), W2.eval(), b2.eval()
sess.close()

2 Layer Neural Network Does not Converge

Background

Problem

What I Have Done

1 Answers1