-5

I have two arrays like:

correct = [['*','*'],['*','PER','*','GPE','ORG'],['GPE','*','*','*','ORG']]
predicted = [['PER','*'],['*','ORG','*','GPE','ORG'],['PER','*','*','*','MISC']]

Length of correct and predicted is same(10K+) also the length of each positional element in two arrays have the same length. I want to calculate the precision, recall, and f1 score for such two arrays using python. I Have following 6 classes: 'PER','ORG','MISC','LOC','*','GPE'

want to calculate precision and recall for 5 of the classes (Except '*') also to find f1 score. what can be the efficient way to perform this using python?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Harsh2093
  • 61
  • 1
  • 6
  • 1
    Why a list of lists, with the inner lists having different lengths (2 & 5)?? – desertnaut Apr 24 '18 at 09:17
  • these are the results of an NER model. The lists inside the list are representing the sentences so in the above case 1st sentence has 2 words then next is having 5 and so the length of inside lists. – Harsh2093 Apr 24 '18 at 09:21

1 Answers1

1

You have to flatten your lists as shown here, and then use classification_report from scikit-learn:

correct = [['*','*'],['*','PER','*','GPE','ORG'],['GPE','*','*','*','ORG']]
predicted = [['PER','*'],['*','ORG','*','GPE','ORG'],['PER','*','*','*','MISC']]
target_names = ['PER','ORG','MISC','LOC','GPE'] # leave out '*'

correct_flat = [item for sublist in correct for item in sublist]
predicted_flat = [item for sublist in predicted for item in sublist]

from sklearn.metrics import classification_report
print(classification_report(correct_flat, predicted_flat, target_names=target_names))

Result:

             precision    recall  f1-score   support

        PER       1.00      0.86      0.92         7
        ORG       1.00      0.50      0.67         2
       MISC       0.00      0.00      0.00         0
        LOC       0.50      0.50      0.50         2
        GPE       0.00      0.00      0.00         1

avg / total       0.83      0.67      0.73        12

In this particular example, you will also get a warning:

UndefinedMetricWarning: Recall and F-score are ill-defined and being set to 0.0 in labels with no true samples.

which is due to 'MISC' not being present in the true labels here (correct), but arguably this should not happen in your real data.

desertnaut
  • 57,590
  • 26
  • 140
  • 166