3

I'm trying to build a tensorflow neural network using a sigmoid activation hidden layer and a softmax output layer with 3 classes. The outputs are mostly very bad and I believe it is because I am making a mistake in my model construction because I've built a similar model with Matlab and the results have been good. The data is normalized. These results look like this:

[9.2164397e-01 1.6932052e-03 7.6662831e-02]
[3.4100169e-01 2.2419590e-01 4.3480241e-01]
[2.3466848e-06 1.3276369e-04 9.9986482e-01]
[6.5199631e-01 3.4800139e-01 2.3596617e-06]
[9.9879754e-01 9.0103465e-05 1.1123115e-03]
[6.5749985e-01 2.8860433e-02 3.1363973e-01]

My nn looks like this:

def multilayer_perceptron(x, weights, biases, keep_prob):
    layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
    layer_1 = tf.nn.sigmoid(layer_1)
    layer_1 = tf.nn.dropout(layer_1, keep_prob)
    out_layer = tf.nn.softmax(tf.add(tf.matmul(layer_1,weights['out']),biases['out']))
    return out_layer

With the following cost function:

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=predictions, labels=y))

I'm growing convinced that my implementation is incorrect and I am doing something very silly. Hours on google and looking at other examples hasn't helped.

UPDATE: When I changed the cost function (shown below), I get decent results. This feels wrong though.

cost = tf.losses.mean_squared_error(predictions=predictions, labels=y)

1 Answers1

2

Your cost function implements a softmax atop of your model output which also has a softmax. You should remove the one in the loss function. Besides this your code seems fine: Are you sure: That the topology (dropout rate, number of layers number of neurons per layer) are the same with both of your models? Are you sure you didn't swar the order of your classes. What about loss and validation loss metric after both trainings?

dennis-w
  • 2,166
  • 1
  • 13
  • 23
  • Will look into everything now, but removing the softmax function from the output produces this: [-0.05443001 0.14635548 0.4816277 ] [ 0.06796718 -0.2497113 0.2590661 ] [-0.54969156 -0.8992554 0.86875486] – Wayde Herman Mar 16 '18 at 12:35
  • Yes that makes sense. Because now you have no softmax output for predicting. Seems like the tensorflow nn api does not have a cross entropy without softmax. But you could use losses from tf.keras.losses – dennis-w Mar 16 '18 at 12:43
  • 2
    "This should not cause any problems" - this will cause problems – Maxim Mar 16 '18 at 15:26
  • Yes you‘re right. Forgot that softmax do something in addition to scale the output. Thanks I will edit it. Do you know why tf.nn use softmax in the loss function. Seems very inconvenient to me. – dennis-w Mar 16 '18 at 15:50
  • 1
    The reason has to do with the derivative : if you compute the derivative of this as a block (softmax + loss) it simplifies to y - p otherwise you get p/x and that is therefore more computationally unstable. See https://postimg.org/image/hdd018si3/ vs https://postimg.org/image/hdd019nd7/ – ted Mar 16 '18 at 16:17