0

enter image description herehow to calculate Average TPR, TNR, FPR, FNR in case of imbalanced dataset?

example FPR: [3.54224720e-04 0.00000000e+00 1.59383505e-05 0.00000000e+00] So, can I calculate to sum of 4 class and divided by 4?

TPR :[3.54224720e-04 + 0.00000000e+00 + 1.59383505e-05 + 0.00000000e+00]/4 = 0.99966 ?

And how to calculate 3.54224720e-04 it is equal .000354224720 ?

Thank you

FP = np.sum(matrix, axis=0) - np.diag(matrix)
FN = np.sum(matrix, axis=1) - np.diag(matrix)
TP = np.diag(matrix)
TN = np.sum(matrix) - (FP + FN + TP)

# True Positive rate
TPR = TP/(TP+FN)
print("TPR:", TPR)
# True Negative Rate
TNR = TN/(TN+FP)
print("TNR:", TNR)
# False Positive Rate
FPR = FP/(FP+TN)
print("FPR:", FPR)
# False Negative Rate
FNR = FN/(TP+FN)
print("FNR:", FNR)

# Overall accuracy
ACC = (TP+TN)/(TP+FP+FN+TN)
print("ACC :", ACC)
delwar.naist
  • 51
  • 1
  • 12
  • R in TPR stands for recall ?? – Mukul Jan 30 '20 at 12:10
  • R for Rate - True Positive Rate – delwar.naist Jan 30 '20 at 12:12
  • 1.do you have multiple TPRs from different runs and you want to end up with one? 2. what do you mean by imbalanced dataset and why should it matter for the calculation ? – Yasi Klingler Jan 30 '20 at 12:20
  • In my dataste there is 4 class instances. If you see the scree shot, TPR is coming separately for each class. I am little bit confuse regarding there is any difference between balance and imbalanced calculation. I didnot consider the Accuracy from here, regarding accuracy i did separate calculation correctly classified instances against incorrectly instances because of Imbalanced dataset print("Accuracy Score:" + str(accuracy_score(yTest.argmax(axis=1), yPred.argmax(axis=1)))). Thank you – delwar.naist Jan 30 '20 at 12:22

1 Answers1

1

There are different ways of measuring the average of the metrics. if you check the packages e.g. sklearn, you see there are multiple parameters that you can give. either micro, macro, weighted, and etc.

if you want to calculate them manually, one way(micro) is to get different TP, FN, FP, and TN values from your four different outputs and sum them up together, and then calculate your metrics.

So, you should really understand your problem and to see which one makes sense. Mostly, in the case of imbalanced data, it is better to use the weighted average. Keep in mind that if you have any baseline calculation, you have to use the exact same method for calculating these values to give a fair comparison since there can be huge differences between different ways of averaging.

and yes, those two numbers are equal.

Update:

As the documentation shows:

Weighted average: Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and recall.

this question is also related.

In your case for weighted metrics, you calculate each metric for each of your 4 classes separately. having the number of instances in each of the classes, you calculate the weighted average metric. This pictures shows the equaation for the weighted precision:

enter image description here

Yasi Klingler
  • 606
  • 6
  • 13