1

I would like to calculate a confusion matrix for two text files. Does anyone know of a library or tool either in python or shell script which can do this?

for example I have two files

FILE A:

1
1
2
2

FILE B:

2
2
2
2

Where I would get a confusion matrix:

   1   2
--------
1| 0   2
2| 0   2

Update: I would like to point out that the original post includes row and column labels

badner
  • 768
  • 2
  • 10
  • 31
  • I would appreciate if you could have a look at this dear: https://stackoverflow.com/questions/44215561/python-creating-confusion-matrix-from-multiple-csv-files – Mahsolid May 27 '17 at 14:42

1 Answers1

4

This is probably overkill, but scikit-learn will do that pretty easily:

from sklearn.metrics import confusion_matrix

# Read the data
with open('file1', 'r') as infile:
    true_values = [int(i) for i in infile]
with open('file2', 'r') as infile:
    predictions = [int(i) for i in infile]

# Make confusion matrix
confusion = confusion_matrix(true_values, predictions)

print(confusion)

With output

[[0 2]
 [0 2]]

http://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html

Update: To print with labels, you could either convert to a dataframe with pandas or something like this:

def print_confusion(confusion):
    print('   ' + '  '.join([str(n) for n in range(confusion.shape[1])]))
    for rownum in range(confusion.shape[0]):
        print(str(rownum) + '  ' + '  '.join([str(n) for n in confusion[rownum]]))

which prints

   0  1
0  0  2
1  0  2
kgully
  • 650
  • 7
  • 16
  • yeah that works well. I would still like to avoid having to install sklearn all the time because it comes with a load of dependencies. – badner Oct 25 '16 at 22:02
  • Do you know how to get the column and row labels to print as well? I used numpy.savetxt(outfile, confusion, delimiter=",", fmt='%s') – badner Oct 26 '16 at 21:57
  • I would appreciate if you could have a look at this dear: https://stackoverflow.com/questions/44215561/python-creating-confusion-matrix-from-multiple-csv-files – Mahsolid May 27 '17 at 14:43