13

I have following data shapes

X_Train.shape,Y_Train.shape
Out[52]: ((983, 19900), (983,))
X_Test.shape,Y_Test.shape
Out[53]: ((52, 19900), (52,))

I am running a simple binary classifier as Y_train and Y_test could be either 1 or 2

import  keras
import  tensorflow as tf
from keras import  layers
from keras.layers import Input, Dense
from keras.models import Model,Sequential
import numpy as np
from  keras.optimizers import  Adam

myModel = keras.Sequential([
    keras.layers.Dense(1000,activation=tf.nn.relu,input_shape=(19900,)),
    keras.layers.Dense(64, activation=tf.nn.relu),
    keras.layers.Dense(32, activation=tf.nn.relu),
    keras.layers.Dense(1, activation=tf.nn.softmax)
])

myModel.compile(optimizer='adam', loss='sparse_categorical_crossentropy',metrics=['accuracy'])
myModel.fit(X_Train, Y_Train, epochs=100,batch_size=1000)
test_loss,test_acc=myModel.evaluate(X_Test,Y_Test)

Output of the Code

Training Loss and Accuracy

Epoch 1/100
983/983 [==============================] - 1s 1ms/step - loss: nan - acc: 0.4608
Epoch 2/100
983/983 [==============================] - 0s 206us/step - loss: nan - acc: 0.4873
Epoch 3/100
983/983 [==============================] - 0s 200us/step - loss: nan - acc: 0.4883
Epoch 4/100
983/983 [==============================] - 0s 197us/step - loss: nan - acc: 0.4883
Epoch 5/100
983/983 [==============================] - 0s 194us/step - loss: nan - acc: 0.4873
Epoch 6/100
983/983 [==============================] - 0s 202us/step - loss: nan - acc: 0.4863
Epoch 7/100
983/983 [==============================] - 0s 198us/step - loss: nan - acc: 0.4863
Epoch 8/100
983/983 [==============================] - 0s 194us/step - loss: nan - acc: 0.4883
Epoch 9/100
983/983 [==============================] - 0s 196us/step - loss: nan - acc: 0.4873
Epoch 10/100
983/983 [==============================] - 0s 198us/step - loss: nan - acc: 0.4873
Epoch 11/100
983/983 [==============================] - 0s 200us/step - loss: nan - acc: 0.4893
Epoch 12/100
983/983 [==============================] - 0s 198us/step - loss: nan - acc: 0.4873
Epoch 13/100
983/983 [==============================] - 0s 194us/step - loss: nan - acc: 0.4873
Epoch 14/100
983/983 [==============================] - 0s 197us/step - loss: nan - acc: 0.4883
Epoch 97/100
    983/983 [==============================] - 0s 196us/step - loss: nan - acc: 0.4893
Epoch 98/100
    983/983 [==============================] - 0s 199us/step - loss: nan - acc: 0.4883
Epoch 99/100
    983/983 [==============================] - 0s 193us/step - loss: nan - acc: 0.4883
Epoch 100/100
    983/983 [==============================] - 0s 196us/step - loss: nan - acc: 0.4863

Testing Loss and Accuracy

test_loss,test_acc
Out[58]: (nan, 0.4615384661234342)

I also checked if there is any nan value in my data

np.isnan(X_Train).any()
Out[5]: False
np.isnan(Y_Train).any()
Out[6]: False
np.isnan(X_Test).any()
Out[7]: False
np.isnan(Y_Test).any()
Out[8]: False

My Question is why my training accuracy is not improving and why loss is nan also why without one-hot encoding the softmax in the output is working fine?

Note1: I apologize that my data is big so I cannot share it here but if there are some way to share it here then I am ready to do that.

Note2 There are lot of zero values in my training data

Naseer
  • 4,041
  • 9
  • 36
  • 72
  • Possible duplicate of [NaN loss when training regression network](https://stackoverflow.com/questions/37232782/nan-loss-when-training-regression-network) – Ashwin Geet D'Sa May 20 '19 at 09:50
  • 2
    Your batch size seems to inappropriate in function `model.fit()`. You definitely need to normalize your data. – Danny Fang May 20 '19 at 09:57

3 Answers3

9

Sometimes with Keras the combination of Relu and Softmax causes numerical troubles as Relu can produce large positive values corresponding to very small probabilities.

Try to use tanh instead of Relu

AdriBento
  • 589
  • 5
  • 16
  • Is my final layer activation function right as I am using softmax without hot encoding the output data? – Naseer May 20 '19 at 12:30
8

If you are getting NaN values in loss, it means that input is outside of the function domain. There are multiple reasons why this could occur. Here are few steps to track down the cause,

1) If an input is outside of the function domain, then determine what those inputs are. Track the progression of input values to your cost function.

2) Check if there are any null or nan values in input data set. Can be accomplished by

DataFrame.isnull().any() 

3) Change the scaling of input data. Normalizing the data between 0 and 1 the start the training.

4) Change method of weight initialization.

It's difficult to point to the exact solution with Deep Neural Networks. So try the above methods and it should give you a fair idea on what is going wrong.

Chandan M S
  • 391
  • 1
  • 6
  • Ok I am now trying normalization and yes you are right I need to track down to the very beginning of my dataset where I read data from the files. – Naseer May 20 '19 at 12:24
7

Softmax activation is not the right choice here. You have only one neuron on the output layer.

Let's consider how softmax function is defined.(Image from wikepedia.org)

img.
Since there is only one neuron on the last layer, sigma(z_i) will be 1 for all values of z_i.

Since you are using sparse_categorical_crossentropy, keras(or tensorflow) can infer the number of the classes from the shape of the logits. In keras(or tensorflow) the shape of logits is assumed to be [BATCH_SIZE, NUM_CLASSES]. The shape your logits is [None, 1], so keras assumes that the number of your classes is 1, but your are feeding more than one classes( 0 or 1) and that is causing the error.

The right activation function here is sigmoid(tanh also could work by altering the dataset target to be either -1 or 1). The loss should be binary_crossentropy.

myModel = keras.Sequential([
    keras.layers.Dense(1000,activation=tf.nn.relu,input_shape=(19900,)),
    keras.layers.Dense(64, activation=tf.nn.relu),
    keras.layers.Dense(32, activation=tf.nn.relu),
    keras.layers.Dense(1, activation="sigmoid")
])

myModel.compile(optimizer='adam', loss='binary_crossentropy',metrics=['accuracy'])
Mitiku
  • 5,337
  • 3
  • 18
  • 35
  • How can I use softmax here because I know in that case I need to have two neurons at the output and use one hot encoding for the output so can you guide me a little bit? – Naseer May 20 '19 at 13:16
  • So labels then need to be converted to one hot encoded vector like in my specific case if I specify two neurons at the output layer then Y_train dimension should be (983,2) and Y_test dimension should be (52,2) am I right ? – Naseer May 20 '19 at 13:20
  • That also could work, but I've provided simple solution check it out. – Mitiku May 20 '19 at 13:33
  • you are right man thanks. Now I am getting loss but the problem is as before my accuracy does not improve beyond 0.48 as it was before. I mean why it is not getting improved at all, really I am now sick of this .048 number indeed! – Naseer May 20 '19 at 13:47
  • 1
    You are getting this much lower accuracy because your input features are very high(19900) and the model couldn't able to deal with this high dimensional input space. This problem is generally known as [Curse of Dimensionality](https://en.wikipedia.org/wiki/Curse_of_dimensionality). The problem would have been solved if the number of dataset was large. You have two options, 1. Increase the number of dataset 2. Find a way to decrease the number of input features([Feature Engineering](https://en.wikipedia.org/wiki/Feature_engineering)). – Mitiku May 21 '19 at 07:57