Calculating categorical accuracy manually doesn't match the one given by keras

Question

I'm having trouble in interpreting the output of Keras model.fit() method.

The setting

print(tf.version.VERSION) # 2.3.0
print(keras.__version__) # 2.4.0

I have a simple feedforward model for a 3-class classification problem:

def get_baseline_mlp(signal_length):
    input_tensor = keras.layers.Input(signal_length, name="input")
    dense_1 = keras.layers.Flatten()(input_tensor)

    dense_1 = keras.layers.Dense(name='dense_1',activation='relu',units=500)(dense_1)
    dense_1 = keras.layers.Dense(name='dense_2',activation='relu',units=500)(dense_1)
    dense_1 = keras.layers.Dense(name='dense_3',activation='relu',units=500)(dense_1)
    dense_1 = keras.layers.Dense(name='dense_4',activation='softmax',units=3, bias_initializer='zero')(dense_1)

    model = keras.models.Model(inputs=input_tensor, outputs=[dense_1])
    model.summary()
    return model

My training data are univariate timeseries, and my output is a one-hot encoded vector of length 3 (I have 3 classes in my classification problem)

Model is compiled as following:

mlp_base.compile(optimizer=optimizer, 
                           loss='categorical_crossentropy',
                           metrics=['categorical_accuracy'])

I have a function to manually calculate accuracy of my prediction with two methods:

def get_accuracy(model, true_x, true_y): 
    res = model.predict(true_x)
    res = np.rint(res)
    right = 0
    for i in range(len(true_y[:, 0])):
        if np.array_equal(res[i, :], true_y[i, :]):
            #print(res[i,:], tr_y[i,:])
            right += 1
        else:
            pass
    tot = len(true_y[:,0])
    print('True - total', right, tot)
    print('acc: {}'.format((right/tot)))
    print()
    print(' method 2 - categorical')
    res = model.predict(true_x)
    res = np.argmax(res, axis=-1)
    true_y = np.argmax(true_y, axis=-1)
    right = 0
    for i in range(len(true_y)):
        if res[i] == true_y[i]:
            right += 1
        else:
            pass
    tot = len(true_y)
    print('True - total', right, tot)
    print('acc: {}'.format((right/tot)))

The Problem

At training end, the outputted categorical accuracy does not match the one I get using my custom function.

Training output:

Model: "functional_17"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input (InputLayer)           [(None, 9000)]            0         
_________________________________________________________________
flatten_8 (Flatten)          (None, 9000)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 500)               4500500   
_________________________________________________________________
dense_2 (Dense)              (None, 500)               250500    
_________________________________________________________________
dense_3 (Dense)              (None, 500)               250500    
_________________________________________________________________
dense_4 (Dense)              (None, 3)                 1503      
=================================================================
Total params: 5,003,003
Trainable params: 5,003,003
Non-trainable params: 0
-------------------------------------------------------------------
Fit model on training data
Epoch 1/2
20/20 [==] - 0s 14ms/step - loss: 1.3796 categorical_accuracy: 0.3250 - val_loss: 0.9240 - 
Epoch 2/2
20/20 [==] - 0s 8ms/step - loss: 0.8131 categorical_accuracy: 0.6100 - val_loss: 1.2811

Output of accuracy function:

True / total 169 200
acc: 0.845

 method 2
True / total 182 200
acc: 0.91

Why am I getting wrong results? Is my accuracy implementation wrong?

Update

Correcting the settings as desertnaut suggested is still not working.

Output of fit:

Epoch 1/3
105/105 [===] - 1s 9ms/step - loss: 1.7666 - categorical_accuracy: 0.2980
Epoch 2/3
105/105 [===] - 1s 6ms/step - loss: 1.2380 - categorical_accuracy: 0.4432
Epoch 3/3
105/105 [===] - 1s 5ms/step - loss: 1.0318 - categorical_accuracy: 0.5989

If I use the categorical accuracy function by keras I'm still getting different results.

cat_acc =  keras.metrics.CategoricalAccuracy()
cat_acc.update_state(tr_y2, y_pred)
print(cat_acc.result().numpy()) # outputs : 0.7211079

Interestingly, if I compute with the above methods the validation accuracy I get consistent output.

desertnaut · Answer 1 · 2020-11-28T01:46:26.247

1

Not quite sure about your accuracy calculation (seems unnecessary convoluted, and we always prefer vector calculations over for loops), but there are two issues with your code that may impact the results (or even render them meaningless).

The first issue is that, since your are in a multiclass setting, you should compile your model with loss='categorical_crossentropy', and not 'binary_crossentropy'; check own answer in Why binary_crossentropy and categorical_crossentropy give different performances for the same problem? to see what may happen when you mix losses & accuracies that way (plus, a 'binary_accuracy' here is absolutely meaningless).

The second issue is that you erroneously use activation='sigmoid' for your last layer: since you are in a multi-class (not multi-label) setting with your labels one-hot encoded, the activation in your last layer should be softmax, and not sigmoid.

edited Nov 28 '20 at 01:46

answered Nov 28 '20 at 01:14

desertnaut

57,590
26
140
166

Thanks, the wrong setup was the cause of the mismatch between the computed accuracy in my function (method 2) and the one from keras. With the same setup but with binary_crossentropy, sigmoid as last layer activation and binary accuracy there is a mismatch. What is wrong with the (convoluted) first part of the accuracy function? Is that the correct way of computing accuracy for the binary classification setting? Thanks in advance :) – Mick Hardins Nov 28 '20 at 01:53
@MickHardins As already said, we don't use `for` loops in such cases - we use vector & array operations instead; you should be able to calculate the accuracy in ~ 5 lines using only array methods like `argmax`, `mean` etc. See the respective [scikit-learn function](https://github.com/scikit-learn/scikit-learn/blob/0fb307bf3/sklearn/metrics/_classification.py#L186) for some inspiration. And if the answer resolved your issue, kindly accept it. – desertnaut Nov 28 '20 at 10:11
I've updated the question as it seems that the problem is unsolved. – Mick Hardins Nov 28 '20 at 18:35

Calculating categorical accuracy manually doesn't match the one given by keras

Update

1 Answers1