Implement MLP in tensorflow

Question

I want to implement the MLP model taught in https://www.coursera.org/learn/machine-learning, using tensorflow. Here's implementation.

# one hidden layer MLP

x = tf.placeholder(tf.float32, shape=[None, 784])
y = tf.placeholder(tf.float32, shape=[None, 10])

W_h1 = tf.Variable(tf.random_normal([784, 512]))
h1 = tf.nn.sigmoid(tf.matmul(x, W_h1))

W_out = tf.Variable(tf.random_normal([512, 10]))
y_ = tf.matmul(h1, W_out)

# cross_entropy = tf.nn.sigmoid_cross_entropy_with_logits(y_, y)
cross_entropy = tf.reduce_sum(- y * tf.log(y_) - (1 - y) * tf.log(1 - y_), 1)
loss = tf.reduce_mean(cross_entropy)
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(loss)

correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

# train
with tf.Session() as s:
    s.run(tf.initialize_all_variables())

    for i in range(10000):
        batch_x, batch_y = mnist.train.next_batch(100)
        s.run(train_step, feed_dict={x: batch_x, y: batch_y})

        if i % 100 == 0:
            train_accuracy = accuracy.eval(feed_dict={x: batch_x, y: batch_y})
            print('step {0}, training accuracy {1}'.format(i, train_accuracy))

However, it does not work. I think the definition for the layers are correct, but the problem is in the cross_entropy. If I use the first one, the one got commented out, the model converges quickly; but if I use the 2nd one, which I think/hope is the translation of the previous equation, the model won't converge.

If you want to take a look at the cost equation, you can find it at here.

Update

I have implemented this same MLP model using numpy and scipy, and it works.

In the tensorflow code, I added a print line in the training loop, and I found out that all the elements in y_ are nan...I think it is caused by arithmetic overflow or something alike.

I think those two cost functions expect different 'y_'. The first wants the raw linear output and the second wants the linear outputs scaled between 1 and 0 by the sum of all categories. The scaling can be done by tf.nn.softmax. — user728291, Jan 29 '16 at 06:55
I don't think your first loss is what you meant to use. The common one is `softmax_cross_entropy_with_logits`. Please take some time to read the official tutorial in tensorflow at https://www.tensorflow.org/versions/0.6.0/tutorials/mnist/tf/index.html#tensorflow-mechanics-101 or https://www.tensorflow.org/versions/0.6.0/tutorials/mnist/beginners/index.html#mnist-for-ml-beginners — colinfang, Jan 29 '16 at 20:55

score 3 · Accepted Answer · edited May 23 '17 at 10:29

3

It is likely 0*log(0) issue.

Replacing

cross_entropy = tf.reduce_sum(- y * tf.log(y_) - (1 - y) * tf.log(1 - y_), 1)

with

cross_entropy = tf.reduce_sum(- y * tf.log(tf.clip_by_value(y_, 1e-10, 1.0)) - (1 - y) * tf.log(tf.clip_by_value(1 - y_, 1e-10, 1.0)), 1)

Please see Tensorflow NaN bug?.

edited May 23 '17 at 10:29

Community

1
1

answered Apr 03 '16 at 13:56

satojkovic

689
1
4
15

I kinda thought it is like 0 * log(0) problem. I just could not find a way to solve it in TF. Thanks a lot~ – David S. Apr 05 '16 at 03:44

score 0 · Answer 2 · answered Jan 29 '16 at 10:29

0

The problem I think is that nn.sigmoid_cross_entropy_with_logits expects unormalized results, where as the function you replace it with cross_entropy = tf.reduce_sum(- y * tf.log(y_) - (1 - y) * tf.log(1 - y_), 1)

Expects y_ to be normalized (by the sigmoid) between 0 and 1

try replacing

y_ = tf.matmul(h1, W_out)

with

y_ = tf.nn.sigmoid(tf.matmul(h1, W_out))

answered Jan 29 '16 at 10:29

Daniel Slater

4,123
4
28
39

Sorry, thought it was worth a try, what does the output look like when you do this? – Daniel Slater Jan 29 '16 at 13:47
The `train_accuracy` is about **0.1**, for over 10k iterations. – David S. Jan 30 '16 at 02:25

Implement MLP in tensorflow

Update

2 Answers2