Macro metrics (recall/F1...) for multiclass CNN

Question

I use CNN for image classification on unbalance dataset. I'm totaly new with tensorflow backend. It's multiclass problem (not multilabel) and I have 16 classes. Class are one hot encoded.

I want to compute MACRO metrics for each epoch: F1, precision and recall.

I found a code to print those Macro metrics but it's only work on validation set From: https://medium.com/@thongonary/how-to-compute-f1-score-for-each-epoch-in-keras-a1acd17715a2

class Metrics(Callback):

 def on_train_begin(self, logs={}):
  self.val_f1s = []
  self.val_recalls = []
  self.val_precisions = []

 def on_epoch_end(self, epoch, logs={}):
  val_predict = (np.asarray(self.model.predict(self.validation_data[0]))).round()
  val_targ = self.validation_data[1]
  _val_f1 = f1_score(val_targ, val_predict,average='macro')
  _val_recall = recall_score(val_targ, val_predict,average='macro')
  _val_precision = precision_score(val_targ, val_predict,average='macro')
  self.val_f1s.append(_val_f1)
  self.val_recalls.append(_val_recall)
  self.val_precisions.append(_val_precision)
  print (" — val_f1: %f — val_precision: %f — val_recall %f" % (_val_f1, _val_precision, _val_recall))
  return

metrics = Metrics()

I'm not even sure this code is really working since we use

 val_predict = (np.asarray(self.model.predict(self.validation_data[0]))).round()

could ROUND lead to error in case of multiclass classification?

And I use this code to print the metrics (only recall since that the important metrics for me) on the training set (also compute on validation set since it's used in model.compute) code has been adapted from: Custom macro for recall in keras



def recall(y_true,y_pred):
     true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
     possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
     return  true_positives / (possible_positives + K.epsilon())

def unweightedRecall(y_true, y_pred):
    return (recall(y_true[:,0],y_pred[:,0]) + recall(y_true[:,1],y_pred[:,1])+recall(y_true[:,2],y_pred[:,2]) + recall(y_true[:,3],y_pred[:,3])
            +recall(y_true[:,4],y_pred[:,4]) + recall(y_true[:,5],y_pred[:,5])
            +recall(y_true[:,6],y_pred[:,6]) + recall(y_true[:,7],y_pred[:,7])
            +recall(y_true[:,8],y_pred[:,8]) + recall(y_true[:,9],y_pred[:,9])
            +recall(y_true[:,10],y_pred[:,10]) + recall(y_true[:,11],y_pred[:,11])
            +recall(y_true[:,12],y_pred[:,12]) + recall(y_true[:,13],y_pred[:,13])
            +recall(y_true[:,14],y_pred[:,14]) + recall(y_true[:,15],y_pred[:,15]))/16.

I run my model with

model.compile(optimizer="adam", loss="categorical_crossentropy",metrics=[unweightedRecall,"accuracy"])   #model compilation with unweightedRecall metrics

train =model.fit_generator(image_gen.flow(train_X, train_label, batch_size=64),epochs=100,verbose=1,validation_data=(valid_X, valid_label),class_weight=class_weights,callbacks=[metrics],steps_per_epoch=len(train_X)/64)  #run the model

VALIDATION macro recall differ from the 2 different code.

i.e (look val_unweightedRecall and val_recall)

Epoch 10/100
19/18 [===============================] - 13s 703ms/step - loss: 1.5167 - unweightedRecall: 0.1269 - acc: 0.5295 - val_loss: 1.5339 - val_unweightedRecall: 0.1272 - val_acc: 0.5519
 — val_f1: 0.168833 — val_precision: 0.197502 — val_recall 0.15636

Why do i have different value on my macro validation recall with the two different code?

Bonus question: For people who have already tryied this, is it really worth to use custom loss based on our interested metric (recall for example) or categorical cross entropy with weights produce same result?

Frayal · Answer 1 · 2019-05-28T09:41:32.363

2

let me answer both question but in the opposite order:

You can't use Recall as a base for a custom loss: It is not convex! If you do not fully understand why Recall or precision or f1 can't be used as a loss, please take the time to see the role of the loss (it is afterall a huge parameter in your model).

Indeed, the round is intended for a binary problem. As they say, if it's not you then it's the other. But in your case it's wrong. Let's go throw the code:

val_predict = (np.asarray(self.model.predict(self.validation_data[0]))).round()

from the inside out, he take the data (self.validation_data[0;]) and predict a number (1 neuron as output). As such he compute the probability of being a 1. If this probability is over 0.5, then the round transforms it into a 1, if it is under, it transforms it to a 0. As you can see, it is wrong for you. In some case you won't predict any class. Following this mistake, the rest is also wrong.

Now, the solution. You want to compute the mean Recall at every step. by the way, "but it only works on validation set". yes that is intended, you use the validation to validate the model, not the train, else it is cheating.

so Recall is equal to true positive over all positives. Let's do that!

def recall(y_true, y_pred):
     recall = 0
     pred = K.argmax(y_pred)
     true = K.argmax(y_true)
     for i in range(16):
         p = K.cast(K.equal(pred,i),'int32')
         t = K.cast(K.equal(true,i),'int32')
         # Compute the true positive
         common = K.sum(K.dot(K.reshape(t,(1,-1)),K.reshape(p,(-1,1))))
         # divide by all positives in t
         recall += common/ (K.sum(t) + K.epsilon)
     return recall/16

This gives you the mean recall for all classes. you could print the value for every class.

Tell me if you have any question!

for an implementation of the binary Recall, see this question from which the code is adapted.

edited May 28 '19 at 09:41

answered May 27 '19 at 15:27

Frayal

2,117
11
17

Merci ! May be i'm wrong but it's look like you are computing **precision** (true positive/predicted positive) instead of recall. Am i right or i just don't understand what you did?You are right, I don't really want to get those metric for training set but I want to be able to compute those metric whne I compute the model to use it to save best weight.. which was not possible with the first code; But this is possible with your code – akhetos May 28 '19 at 07:16
1

well i do use recall. Recall is a mesure of how many true postive you get over all positives. Precision is how many True positives you have over the number of postives you have predicted. i sum t which is "real" positives (summing p would give the precision) – Frayal May 28 '19 at 09:39
Do you know how to make a function to follow F1 micro score? – akhetos Jun 03 '19 at 07:33
1

compute recall and prediction (replace `K.sum(t)` by `K.sum(p)`) and then use the formula of the f beta to combine those two. but i'm not sure if you need to compute the F1 of the mean recall and precision or the mean of F1 – Frayal Jun 03 '19 at 07:36
In your code we compute recall for each class then average it. For MICRO F1 score I should be able to compute recall for all the class in the same time but I can't figure how to do it – akhetos Jun 03 '19 at 07:40
1

well the recall variable can be a list and you replace the += by a .append(). that way you will retrieve all recalls. – Frayal Jun 03 '19 at 07:44
@akhetos why did you reopened the question by removing the accepting mark? if you have a new issue, please open a new question. – Frayal Jul 10 '19 at 08:47
I'm sorry but even if you'r code look good to me I can't make it work. I'll run it again in a few day and tel you what doesn't work – akhetos Jul 10 '19 at 08:51

Macro metrics (recall/F1...) for multiclass CNN

1 Answers1

Linked