5

I need to define a function that generates a confusion matrix. So I have two vectors, y_label and y_predict, the element values of which are either 0, 1, 2. The goal of the function is to create a count of labels:

  | 0 | 1 | 2 |
--------------
0 |   |   |   |
--------------
1 |   |   |   |
--------------
2 |   |   |   |
--------------

For example, cm[0,1] should contain counts of elements where y_label[i] = 0 and y_predict[i] = 1, for every i.

So far, this is what I've done:

def get_confusion_matrix(y_label, y_fit):

    cm = np.ndarray([3,3])

    for i in range(3):
        for j in range(3):
            cm[i, j] = ....

    return cm

Of course, I can easily do multiple-level for loops to count, but I want to avoid that if there are short cuts in Python / numpy.

I'm thinking also of making y_label and y_predict merged to become an array of tuples, then using dict-zip technique, similar to here:

How to count the occurrence of certain item in an ndarray in Python?

But the solution is still a bit hazy on my head. Please confirm if this is also possible.

oikonomiyaki
  • 7,691
  • 15
  • 62
  • 101

3 Answers3

8

You could use the function confusion_matrix from scikit learn. It seems to produce exactly what you're after.

from sklearn.metrics import confusion_matrix
y_true = [2, 0, 2, 2, 0, 1]
y_pred = [0, 0, 2, 2, 0, 2]
confusion_matrix(y_true, y_pred)
7

Here's a quick way to create the confusion matrix, using numpy.add.at.

First, here's some sample data:

In [93]: y_label
Out[93]: array([2, 2, 0, 0, 1, 0, 0, 2, 1, 1, 0, 0, 1, 2, 1, 0])

In [94]: y_predict
Out[94]: array([2, 1, 0, 0, 0, 0, 0, 1, 0, 2, 2, 1, 0, 0, 2, 2])

Create the array cm containing zeros, and then add 1 at each index (y_label[i], y_predict[i]):

In [95]: cm = np.zeros((3, 3), dtype=int)

In [96]: np.add.at(cm, (y_label, y_predict), 1)

In [97]: cm
Out[97]: 
array([[4, 1, 2],
       [3, 0, 2],
       [1, 2, 1]])

In SciPy 1.7.0, the function scipy.stats.contingency.crosstab was added, which provides a nice wrapper for the same calculation. It is like a pared down version of the Pandas crosstab function.

In [55]: from scipy.stats.contingency import crosstab

In [56]: y_label = np.array([2, 2, 0, 0, 1, 0, 0, 2, 1, 1, 0, 0, 1, 2, 1, 0])

In [57]: y_predict = np.array([2, 1, 0, 0, 0, 0, 0, 1, 0, 2, 2, 1, 0, 0, 2, 2])

In [58]: (labels, _), table = crosstab(y_label, y_predict)

In [59]: table
Out[59]: 
array([[4, 1, 2],
       [3, 0, 2],
       [1, 2, 1]])
Warren Weckesser
  • 110,654
  • 19
  • 194
  • 214
1

Scikit-learn has a confusion_matrix function:

from sklearn.metrics import confusion_matrix
y_actu = [2, 2, 0, 0, 1, 0, 0, 2, 1, 1, 0, 0, 1, 2, 1, 0]
y_pred = [2, 1, 0, 0, 0, 0, 0, 1, 0, 2, 2, 1, 0, 0, 2, 2]
confusion_matrix(y_actu, y_pred)

You will get a Numpy array like this:

array([[4, 1, 2],
       [3, 0, 2],
       [1, 2, 1]])

For a better answer, you can use crosstab function in pandas:

import pandas as pd
y_actu = pd.Series([2, 2, 0, 0, 1, 0, 0, 2, 1, 1, 0, 0, 1, 2, 1, 0], name='Actual')
y_pred = pd.Series([2, 1, 0, 0, 0, 0, 0, 1, 0, 2, 2, 1, 0, 0, 2, 2], name='Predicted')
df_confusion = pd.crosstab(y_actu, y_pred)

That output a Pandas DataFrame Object like this:

Predicted  0  1  2
Actual            
0          4  1  2
1          3  0  2
2          1  2  1

you can find completest answer under this question: How to write a confusion matrix in Python?