1
test_generator = test_datagen.flow_from_directory(
    test_dir,
    target_size=(150, 150),
    batch_size=20,
    class_mode='categorical')
test_loss, test_acc = model.evaluate_generator(test_generator, steps=28)
print('test acc:', test_acc)

predict = model.predict_generator(test_generator,steps =28, verbose=0)
print('Prediction: ', predict)

test_imgs, test_labels = next(test_generator)

print(test_labels)

cm =confusion_matrix(test_labels, predict)

I got 2 problems from the above code. Firstly, I get an error of having different number of samples between my test_labels and predict. My test_labels only store 20 samples (as written in the batch size. Meanwhile, my predict from model.predict_generator have total of 560images (20*28 steps)

ValueError: Found input variables with inconsistent numbers of samples: [20, 560]

The second problem is, how do I change my softmax value (from probabilities of my 4 image classes in float to int)? I get an error when I change steps to 1(to test only 20 samples instead of total 560 in above problem)

ValueError: Classification metrics can't handle a mix of multilabel-indicator and continuous-multioutput targets

which I think is the error because when I make prediction I get some 4-d list (from 4 classes) like this e.g.

Prediction:  [[2.9905824e-12 5.5904431e-10 1.8195983e-11 1.0000000e+00]
 [2.7073351e-21 1.0000000e+00 8.3221777e-21 4.9091786e-22]
 [4.2152173e-05 6.1331893e-04 3.7486094e-05 9.9930704e-01]

Is there anyway that i can get which is the exact class my model predict (such as in my test loss, and test accuracy).

Or is there any other simple way to get the confusion matrix in Keras that I didn't know of? :(

Edit1 (obtained from desertnaut) The returned of test_labels variables is as below

array([[0., 0., 0., 1.],
   [0., 0., 0., 1.],
   [0., 1., 0., 0.],
   [0., 1., 0., 0.],
   [0., 1., 0., 0.],
   [0., 0., 0., 1.],
   [1., 0., 0., 0.],
   [0., 0., 0., 1.],
   [0., 1., 0., 0.],
   [0., 0., 0., 1.],
   [0., 0., 0., 1.],
   [0., 1., 0., 0.],
   [0., 0., 0., 1.],
   [0., 0., 0., 1.],
   [0., 0., 1., 0.],
   [0., 1., 0., 0.],
   [0., 1., 0., 0.],
   [0., 0., 0., 1.],
   [0., 0., 0., 1.],
   [0., 0., 1., 0.]], dtype=float32), array([[1., 0., 0., 0.],
   [0., 0., 0., 1.],

^ This is for only 1 cycle (theres total of 28, another 27 more of this lists). This snap is somewhere in the middle of the output. The list too long to show the most top array (cant scroll to top of Spyder's output box). I tried using argmax to try as the second problem above. e.g.

test_class = np.argmax(test_labels, axis=1)
test_class = test_class.tolist()
print(test_class)

But I didnt get the correct answer. I think because of the loop different. I think the output from predict_class as given by you is 1 list which contains all the 560 samples prediction. But for test_label it count as 28 different loop. The output of predict_class is like this. e.g.

[3, 1, 1, 2, 0, 0, 3, 1, 2, 0, 0, 1, 2, 2, 1, 3, 2, 2, 0, 2, 0, 3, 0, 1, 3, 3, 1, 2, 0, 1, 1, 0, 2, 1, 0, 2, 1, 3, 1, 0, 1, 2, 2, 2, 1, 2, 2, 2, 2, 3, 2, 3, 1, 3, 1, 1, 3, 2, 2, 0, 1, 1, 0, 2, 1, 3, 3, 2, 0, 1, 1, 0, 3, 0, 0, 2, 3, 2, 1, 1, 2, 3, 0, 0, 2, 1, 3, 2, 3, 1, 0, 0, 3, 0, 3, 1, 1, 3, 1, 0, 1, 2, 0, 0, 0, 0, 3, 2, 2, 3, 3, 1, 3, 0, 3, 2, 0, 0, 0, 2, 1, 0, 2, 2, 1, 0, 1, 2, 2, 2, 3, 2, 1, 2, 2, 0, 0, 2, 3, 3, 1, 2, 2, 3, 0, 2, 1, 1, 3, 0, 1, 0, 1, 3, 3, 1, 3, 0, 1, 3, 0, 2, 1, 1, 3, 0, 1, 0, 1, 1, 3, 2, 3, 3, 0, 1, 1, 3, 2, 0, 3, 2, 0, 1, 3, 3, 2, 1, 1, 1, 0, 2, 0, 2, 2, 0, 2, 2, 0, 0, 1, 2, 2, 0, 0, 1, 1, 1, 0, 2, 2, 0, 3, 0, 3, 2, 2, 0, 1, 1, 1, 3, 0, 2, 2, 1, 3, 3, 3, 1, 2, 0, 3, 0, 0, 3, 1, 1, 3, 0, 2, 2, 2, 2, 3, 0, 2, 3, 0, 3, 2, 3, 2, 3, 3, 0, 0, 2, 3, 2, 0, 0, 3, 1, 3, 0, 0, 1, 1, 0, 1, 0, 0, 3, 0, 0, 1, 1, 3, 1, 3, 2, 1, 0, 1, 0, 2, 3, 0, 1, 2, 1, 2, 2, 2, 2, 0, 2, 2, 1, 3, 2, 2, 2, 1, 3, 3, 2, 0, 3, 0, 1, 2, 2, 2, 3, 1, 0, 2, 3, 2, 1, 0, 1, 2, 0, 2, 1, 2, 2, 2, 1, 0, 0, 0, 0, 0, 3, 3, 2, 1, 0, 0, 3, 0, 0, 2, 1, 0, 2, 3, 2, 3, 2, 1, 3, 0, 2, 1, 0, 0, 0, 1, 2, 2, 3, 2, 3, 2, 0, 3, 2, 1, 0, 0, 3, 2, 3, 0, 2, 0, 1, 0, 0, 3, 2, 3, 1, 3, 2, 2, 2, 0, 1, 2, 0, 2, 0, 0, 0, 3, 1, 3, 2, 3, 2, 1, 2, 3, 3, 1, 3, 3, 0, 1, 1, 2, 0, 1, 2, 3, 0, 2, 2, 2, 0, 0, 3, 0, 3, 3, 3, 3, 3, 3, 0, 1, 3, 0, 2, 3, 1, 0, 2, 3, 2, 3, 1, 1, 2, 1, 2, 3, 0, 2, 3, 3, 3, 3, 2, 3, 3, 3, 3, 0, 0, 2, 0, 1, 0, 3, 1, 0, 0, 2, 1, 2, 3, 3, 2, 2, 1, 2, 2, 0, 2, 0, 3, 3, 3, 3, 3, 2, 3, 3, 3, 3, 0, 3, 2, 2, 3, 0, 1, 3, 2, 3, 3, 0, 3, 1, 2, 3, 3, 0, 3, 3, 3, 2, 2, 0, 3, 3, 3, 0, 1, 1, 1, 0, 0, 0, 0, 1, 2, 2, 2, 3, 0, 0, 1, 1, 0, 2, 0, 2, 0, 3, 3, 1, 0, 2, 2, 1, 0, 0, 3, 0, 3, 3, 3]

^ 1 list of 560 samples.

The output of test_class (with argmax edit). e.g.

[[7, 3, 0, 2], [9, 3, 2, 0], [0, 2, 9, 6], [0, 2, 3, 1], [2, 3, 0, 1], [6, 0, 1, 4], [5, 0, 1, 2], [1, 3, 2, 0], [0, 2, 3, 5], [0, 1, 3, 7], [1, 0, 8, 4], [3, 7, 1, 0], [3, 5, 0, 2], [9, 0, 3, 1], [0, 2, 1, 9], [8, 5, 1, 0], [2, 0, 1, 8], [0, 5, 1, 3], [0, 17, 1, 4], [2, 1, 7, 0], [0, 4, 5, 1], [1, 2, 0, 4], [0, 2, 3, 1], [2, 0, 1, 3], [3, 2, 1, 0], [0, 2, 7, 6], [5, 0, 18, 2], [2, 0, 7, 1]]

Is there a function in numpy or scipy to make it 1 list of 560 samples instead of 28 lists*20batches.

Edit2

Thanks! Both are now in 1 list. However, is there anyway to check if the samples are shuffled the same way? I obtained 87.8% classification accuracy. but the conf_matrix I get is very very low.

[[33 26 35 46]
 [43 25 41 31]
 [38 36 36 30]
 [32 30 39 39]]
desertnaut
  • 57,590
  • 26
  • 140
  • 166
Jon Salji
  • 23
  • 1
  • 4

2 Answers2

1

For your second problem, since your predictions come one-hot encoded, you should simply get the maximum argument; using your shown 3 predictions as an example:

import numpy as np
# your shown predictions:
predict = np.array( [[2.9905824e-12, 5.5904431e-10, 1.8195983e-11 ,1.0000000e+00],
                     [2.7073351e-21, 1.0000000e+00, 8.3221777e-21, 4.9091786e-22],
                     [4.2152173e-05, 6.1331893e-04, 3.7486094e-05, 9.9930704e-01]])
predict_class = np.argmax(predict, axis=1)
predict_class = predict_class.tolist()
predict_class
# [3, 1, 3]

Regarding your first problem: I assume you cannot independently get your test_labels for the whole of your dataset (otherwise presumably you would use this array of length 560 for your confusion matrix); if so, you could use something like [updated after OP edit]:

test_labels = []
for i in range(28):
    test_imgs, batch_labels = next(test_generator)
    batch_labels = np.argmax(batch_labels, axis=1).tolist()
    test_labels = test_labels + batch_labels

after which both your test_labels and predict_class will be lists of length 560, and you should be able to get the confusion matrix for the whole of your test set as

cm =confusion_matrix(test_labels, predict_class)

To ensure that the predictions and test labels are indeed aligned, you should add the shuffle=False argument to your test_datagen.flow_from_directory() (default value is True - docs).

Given the confusion matrix, if you need further classification measures like precision, recall etc, have a look at my answer here.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
  • Hi sir, thanks for the feedback! I got the second problem using you answer. However, for the first answer (in the range(28) loop) I get 28 lists of 20 prediction. Whereas in the predict_class It is 1 list of 560. I tried using the argmax function as your in second problem. But I get weird answer e.g. [[0, 1, 9, 5], [0, 13, 5, 2], [1, 0, 2, 3], (there's total of 28 of this lists) – Jon Salji Feb 21 '18 at 16:21
  • @JonSalji you are welcome. Please edit you post to add a sample of the `test_labels` variable as returned in your own initial code (or the `batch_labels` of my loop - they should be the same) – desertnaut Feb 21 '18 at 16:33
  • Thanks! I've edited my post with the returned variable that I get. Is there any function in python to extract from 20lists to compact and make it 1 list? – Jon Salji Feb 21 '18 at 16:56
  • @JonSalji so, your test labels are also one-hot encoded! didn't see that coming - stby to update my answer – desertnaut Feb 21 '18 at 16:57
  • Thanks! the code u gave convert it to 1 list. Is there any way to check if the shuffle used for evaluating this is the same for both of it? My confusion matrix that I get is very bad (as edit). Is it have to do with RNG seed? – Jon Salji Feb 21 '18 at 17:13
0

You can also use sklearn

import numpy as np
# your shown predictions:
predict = np.array( [[2.9905824e-12, 5.5904431e-10, 1.8195983e-11 ,1.0000000e+00],
                     [2.7073351e-21, 1.0000000e+00, 8.3221777e-21, 4.9091786e-22],
                     [4.2152173e-05, 6.1331893e-04, 3.7486094e-05, 9.9930704e-01]])

labels = ['class1', 'class2', 'class3', 'class4']
from sklearn.preprocessing import LabelBinarizer
lb.fit(labels)

predict_class = lb.inverse_transform(predict)
print(predict_class)
# ['class4', 'class2', 'class4']
Jash Shah
  • 2,064
  • 4
  • 23
  • 41