Keras accuracy and actual accuracy are exactly reverse of each other

Question

I'm learning Neural Networks and currently implemented object classification on CFAR-10 dataset using Keras library. Here is my definition of a neural network defined by Keras:

# Define the model and train it
model = Sequential()

model.add(Dense(units = 60, input_dim = 1024, activation = 'relu'))
model.add(Dense(units = 50, activation = 'relu'))
model.add(Dense(units = 60, activation = 'relu'))
model.add(Dense(units = 70, activation = 'relu'))
model.add(Dense(units = 30, activation = 'relu'))
model.add(Dense(units = 10, activation = 'sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

model.fit(X_train, y_train, epochs=50, batch_size=10000)

So I've 1 input layer having the input of dimensions 1024 or (1024, ) (each image of 32 * 32 *3 is first converted to grayscale resulting in dimensions of 32 * 32), 5 hidden layers and 1 output layer as defined in the above code.

When I train my model over 50 epochs, I got the accuracy of 0.9 or 90%. Also when I evaluate it using test dataset, I got the accuracy of approx. 90%. Here is the line of code which evaluates the model:

print (model.evaluate(X_test, y_test))

This prints following loss and accuracy:

[1.611809492111206, 0.8999999761581421]

But When I calculate the accuracy manually by making predictions on each test data images, I got accuracy around 11% (This is almost the same as probability randomly making predictions). Here is my code to calculate it manually:

wrong = 0

for x, y in zip(X_test, y_test):
  if not (np.argmax(model.predict(x.reshape(1, -1))) == np.argmax(y)):
    wrong += 1

print (wrong)

This prints out 9002 out of 10000 wrong predictions. So what am I missing here? Why both accuracies are exactly reverse (100 - 89 = 11%) of each other? Any intuitive explanation will help! Thanks.

EDIT:

Here is my code which processes the dataset:

# Process the training and testing data and make in Neural Network comfortable

# convert given colored image to grayscale
def rgb2gray(rgb):
  return np.dot(rgb, [0.2989, 0.5870, 0.1140])

X_train, y_train, X_test, y_test = [], [], [], []

def process_batch(batch_path, is_test = False):
  batch = unpickle(batch_path)
  imgs = batch[b'data']
  labels = batch[b'labels']


  for img in imgs:
    img = img.reshape(3,32,32).transpose([1, 2, 0])
    img = rgb2gray(img)
    img = img.reshape(1, -1)
    if not is_test:
      X_train.append(img)
    else:
      X_test.append(img)

  for label in labels:
    if not is_test:
      y_train.append(label)
    else:
      y_test.append(label)

process_batch('cifar-10-batches-py/data_batch_1')
process_batch('cifar-10-batches-py/data_batch_2')
process_batch('cifar-10-batches-py/data_batch_3')
process_batch('cifar-10-batches-py/data_batch_4')
process_batch('cifar-10-batches-py/data_batch_5')

process_batch('cifar-10-batches-py/test_batch', True)

number_of_classes = 10
number_of_batches = 5
number_of_test_batch = 1

X_train = np.array(X_train).reshape(meta_data[b'num_cases_per_batch'] * number_of_batches, -1)
print ('Shape of training data: {0}'.format(X_train.shape))

# create labels to one hot format
y_train = np.array(y_train)

y_train = np.eye(number_of_classes)[y_train]
print ('Shape of training labels: {0}'.format(y_train.shape))


# Process testing data

X_test = np.array(X_test).reshape(meta_data[b'num_cases_per_batch'] * number_of_test_batch, -1)
print ('Shape of testing data: {0}'.format(X_test.shape))

# create labels to one hot format
y_test = np.array(y_test)

y_test = np.eye(number_of_classes)[y_test]
print ('Shape of testing labels: {0}'.format(y_test.shape))

This is a golden question, thank you! – jtlz2 Aug 10 '21 at 08:49 — jtlz2, Aug 10 '21 at 08:49

rayryeng · Accepted Answer · 2019-06-02T18:41:06.047

2

The reason why this is happening is due to the loss function that you are using. You are using binary cross entropy where you should be using categorical cross entropy as the loss. Binary is only for a two-label problem but you have 10 labels here due to CIFAR-10.

When you show the accuracy metric, it is in fact misleading you because it is showing binary classification performance. The solution is to retrain your model by choosing categorical_crossentropy.

This post has more details: Keras binary_crossentropy vs categorical_crossentropy performance?

Related - this post is answering a different question, but the answer is essentially what your problem is: Keras: model.evaluate vs model.predict accuracy difference in multi-class NLP task

Edit

You mentioned that the accuracy of your model is hovering at around 10% and not improving in your comments. Upon examining your Colab notebook and when you change to categorical cross-entropy, it appears that you are not normalizing your data. Because the pixel values are originally unsigned 8-bit integer, when you create your training set it promotes the values to floating-point, but because of the dynamic range of the data, your neural network has a hard time learning the right weights. When you try to update the weights, the gradients are so small that there are essentially no updates and hence your network is performing just like random chance. The solution is to simply divide your training and test dataset by 255 before you proceed:

X_train /= 255.0
X_test /= 255.0

This will transform your data so that the dynamic range scales from [0,255] to [0,1]. Your model will have an easier time training due to the smaller dynamic range, which should help gradients propagate and not vanish because of the larger scale before normalizing. Because your original model specification has a significant number of dense layers, due to the dynamic range of your data the gradient updates will most likely vanish which is why the performance is poor initially.

When I run your notebook, I get 37% accuracy. This is not unexpected with CIFAR-10 and only a fully-connected / dense network. Also when you run your notebook now, the accuracy and the fraction of wrong examples match.

If you want to increase accuracy, I have a couple of suggestions:

Actually include colour information. Each object in CIFAR-10 has a distinct colour profile that should help in discrimination
Add Convolutional layers. I'm not sure where you are in your learning, but convolutional layers help in learning and extracting the right features in the image so that the most optimal features are presented to the dense layers so that classification on these features increases accuracy. Right now you're classifying raw pixels, which is not advisable given how noisy they can be, or due to how unconstrained things can get (rotation, translation, skew, scale, etc.).

edited Jun 02 '19 at 18:41

answered Jun 02 '19 at 17:44

rayryeng

102,964
22
184
193

I think you are correct, I just realized this and trained it on `categorical_crossentropy` and now I'm getting 10% accuracy, which matches my calculations. But any idea why too low accuracy? – Kaushal28 Jun 02 '19 at 17:46
@Kaushal28 Please update with your post with more - in particular how you load the dataset and how you reshape it so that it's suitable for a feedforward Dense network. – rayryeng Jun 02 '19 at 17:46
Thanks! Updated the question. Pardon me if the code is poorly written as this is my first attempt to train a neural network and processing dataset. – Kaushal28 Jun 02 '19 at 17:50
Let me share my entire notebook. – Kaushal28 Jun 02 '19 at 18:08
Wait. Let me share so you don't need to write the code. Here it is: https://colab.research.google.com/drive/1ztj06uZ7e447e7cEXwIWXLmamUP5Jpwb – Kaushal28 Jun 02 '19 at 18:11
@Kaushal28 Figured it out. You're not normalizing the data. – rayryeng Jun 02 '19 at 18:14
How to normalize in this case? Thanks for your time. – Kaushal28 Jun 02 '19 at 18:16
@Kaushal28 I've edited my answer. Also, I've created a notebook that is fully contained - it not only downloads the data on Colab, but it also untars it and it's ready to go. I've used your code to extract the data. Also, you're missing in loading the metadata dictionary which I've added in: https://colab.research.google.com/drive/1dhFnP9PRCtLxVjCh-FgExkTCFT5NLmTr – rayryeng Jun 02 '19 at 18:23
Perfect! I downloaded all the data before writing the code so it wasn't there in my notebook. Thanks for the suggestions! – Kaushal28 Jun 02 '19 at 18:29
No problem! Good luck with your research! – rayryeng Jun 02 '19 at 18:30
Thanks! Any references which explain why normalization is must in such cases? This confuses me as even after normalization, data will be in floating points but between 0 and 1. So why was neural net struggling previously when data ranged between 0-255 floating point? – Kaushal28 Jun 02 '19 at 18:35
Please see the edit. I added some explanations as to why, but this has everything to do with how the neural network weights are being updated. You don't need to do this if you have very few layers, but because you have 6 layers, you risk the updates disappearing as they propagate through the network. Try using just 1 or 2 layers and not normalizing and see what you get. You should get relatively the same accuracy as with using 6 layers. – rayryeng Jun 02 '19 at 18:39
@Kaushal28 https://becominghuman.ai/image-data-pre-processing-for-neural-networks-498289068258 - This is a good reference on pre-processing image data, including normalizing. – rayryeng Jun 02 '19 at 18:44
1

Thanks for all the help!! – Kaushal28 Jun 02 '19 at 18:46
@rayryeng And this is a golden answer - thank you so so much - I had thought this was a problem with unstable seeds/weight initialization and couldn't work out how and why the output classes were sometimes being inverted.... Thank you, thank you! – jtlz2 Aug 10 '21 at 08:50

Keras accuracy and actual accuracy are exactly reverse of each other

1 Answers1

Edit