1

I want to implement the MLP model taught in https://www.coursera.org/learn/machine-learning, using tensorflow. Here's implementation.

# one hidden layer MLP

x = tf.placeholder(tf.float32, shape=[None, 784])
y = tf.placeholder(tf.float32, shape=[None, 10])

W_h1 = tf.Variable(tf.random_normal([784, 512]))
h1 = tf.nn.sigmoid(tf.matmul(x, W_h1))

W_out = tf.Variable(tf.random_normal([512, 10]))
y_ = tf.matmul(h1, W_out)

# cross_entropy = tf.nn.sigmoid_cross_entropy_with_logits(y_, y)
cross_entropy = tf.reduce_sum(- y * tf.log(y_) - (1 - y) * tf.log(1 - y_), 1)
loss = tf.reduce_mean(cross_entropy)
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(loss)

correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

# train
with tf.Session() as s:
    s.run(tf.initialize_all_variables())

    for i in range(10000):
        batch_x, batch_y = mnist.train.next_batch(100)
        s.run(train_step, feed_dict={x: batch_x, y: batch_y})

        if i % 100 == 0:
            train_accuracy = accuracy.eval(feed_dict={x: batch_x, y: batch_y})
            print('step {0}, training accuracy {1}'.format(i, train_accuracy))

However, it does not work. I think the definition for the layers are correct, but the problem is in the cross_entropy. If I use the first one, the one got commented out, the model converges quickly; but if I use the 2nd one, which I think/hope is the translation of the previous equation, the model won't converge.

If you want to take a look at the cost equation, you can find it at here.

Update

I have implemented this same MLP model using numpy and scipy, and it works.

In the tensorflow code, I added a print line in the training loop, and I found out that all the elements in y_ are nan...I think it is caused by arithmetic overflow or something alike.

MBT
  • 21,733
  • 19
  • 84
  • 102
David S.
  • 10,578
  • 12
  • 62
  • 104
  • I think those two cost functions expect different 'y_'. The first wants the raw linear output and the second wants the linear outputs scaled between 1 and 0 by the sum of all categories. The scaling can be done by tf.nn.softmax. – user728291 Jan 29 '16 at 06:55
  • I don't think your first loss is what you meant to use. The common one is `softmax_cross_entropy_with_logits`. Please take some time to read the official tutorial in tensorflow at https://www.tensorflow.org/versions/0.6.0/tutorials/mnist/tf/index.html#tensorflow-mechanics-101 or https://www.tensorflow.org/versions/0.6.0/tutorials/mnist/beginners/index.html#mnist-for-ml-beginners – colinfang Jan 29 '16 at 20:55

2 Answers2

3

It is likely 0*log(0) issue.

Replacing

cross_entropy = tf.reduce_sum(- y * tf.log(y_) - (1 - y) * tf.log(1 - y_), 1)

with

cross_entropy = tf.reduce_sum(- y * tf.log(tf.clip_by_value(y_, 1e-10, 1.0)) - (1 - y) * tf.log(tf.clip_by_value(1 - y_, 1e-10, 1.0)), 1)

Please see Tensorflow NaN bug?.

Community
  • 1
  • 1
satojkovic
  • 689
  • 1
  • 4
  • 15
  • I kinda thought it is like 0 * log(0) problem. I just could not find a way to solve it in TF. Thanks a lot~ – David S. Apr 05 '16 at 03:44
0

The problem I think is that nn.sigmoid_cross_entropy_with_logits expects unormalized results, where as the function you replace it with cross_entropy = tf.reduce_sum(- y * tf.log(y_) - (1 - y) * tf.log(1 - y_), 1)

Expects y_ to be normalized (by the sigmoid) between 0 and 1

try replacing

y_ = tf.matmul(h1, W_out)

with

y_ = tf.nn.sigmoid(tf.matmul(h1, W_out))
Daniel Slater
  • 4,123
  • 4
  • 28
  • 39