I've implemented a basic neural network from scratch using Tensorflow and trained it on MNIST fashion dataset. It's trained correctly and outputs testing accuracy around ~88-90%
over 10 classes.
Now I've written predict()
function which predicts the class of given image using trained weights. Here is the code:
def predict(images, trained_parameters):
Ws, bs = [], []
parameters = {}
for param in trained_parameters.keys():
parameters[param] = tf.convert_to_tensor(trained_parameters[param])
X = tf.placeholder(tf.float32, [images.shape[0], None], name = 'X')
Z_L = forward_propagation(X, trained_parameters)
p = tf.argmax(Z_L) # Working fine
# p = tf.argmax(tf.nn.softmax(Z_L)) # not working if softmax is applied
with tf.Session() as session:
prediction = session.run(p, feed_dict={X: images})
return prediction
This uses forward_propagation()
function which returns the weighted sum of the last layer (Z
) and not the activitions (A
) because of TensorFlows tf.nn.softmax_cross_entropy_with_logits()
requires Z
instead of A
as it will calculate A
by applying softmax Refer this link for details.
Now in predict()
function, when I make predictions using Z
instead of A
(activations) it's working correctly. By if I calculate softmax on Z
(which is activations A
of the last layer) it's giving incorrect predictions.
Why it's giving correct predictions on weighted sums Z
? We are not supposed to first apply softmax activation (and calculate A
) and then make predictions?
Here is the link to my colab notebook if anyone wants to look at my entire code: Link to Notebook Gist
So what am I missing here?