19

I am developping a segmentation neural network with only two classes, 0 and 1 (0 is the background and 1 the object that I want to find on the image). On each image, there are about 80% of 1 and 20% of 0. As you can see, the dataset is unbalanced and it makes the results wrong. My accuracy is 85% and my loss is low, but that is only because my model is good at finding the background !

I would like to base the optimizer on another metric, like precision or recall which is more usefull in this case.

Does anyone know how to implement this ?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
freshmanData
  • 193
  • 1
  • 1
  • 8
  • 5
    There are very **fundamental** (i.e. mathematical) reasons why our optimizers are based on loss, and not in measures like accuracy, precision, or recall; see my answer in [Cost function training target versus accuracy desired goal](https://stackoverflow.com/questions/47891197/cost-function-training-target-versus-accuracy-desired-goal/47910243#47910243) (it's about loss vs accuracy, but the same argument holds for the other measures as well). – desertnaut Aug 27 '18 at 15:00
  • 1
    optimisation is based on convex function. You can't optimize precision or recall. You have to put this in evluation metric and use it to get the best itteration – Frayal Aug 27 '18 at 15:01
  • Thank you ! Maybe my question was not well posed. What I meant was how could I base my training on precision ? Is it possible ? Or should I only track precision over the epochs and hope for it to lower ? – freshmanData Aug 27 '18 at 15:14
  • Not sure why you think we didn't understand your question; we did, and our answers above (both mine and @Alexis) hold (if you are still wondering, maybe our comments were not clear enough...) – desertnaut Aug 27 '18 at 15:22
  • No no your comments were very clear thanks ! But I still wonder how I can get the precision or the recall higher with an unbalanced dataset. – freshmanData Aug 27 '18 at 15:26
  • This is a totally different question - you should open a new thread (preferably with your code included), asking exactly this, without messing with different optimizers etc... – desertnaut Aug 27 '18 at 15:29
  • Maybe stratify the data by bootstrapping the 0 class to focus training more on it? – Gerges Aug 27 '18 at 15:34
  • Another way, in your loss function, you can use weighted loss where the 0 class mistakes are penalized more. – Gerges Aug 27 '18 at 15:36
  • Thank you @GergesDib, adapting the loss function to my problem is a very good idea ! I should penalize False Negative more than reward True Negative. Now I need to find a way to modify the loss function. – freshmanData Aug 27 '18 at 15:43
  • Maybe check this out, once you decide what the loss function should be. https://stackoverflow.com/questions/43818584/custom-loss-function-in-keras – Gerges Aug 27 '18 at 15:44
  • Thanks a lot @GergesDib ! I will definitely check this out – freshmanData Aug 27 '18 at 15:54
  • I think you need to use AUROC as your loss function. This gives equal importance for classifying both positive and negative irrespective of the size of the class. ie: loss will be based on 50% of how the well positive class is classified and the other 50% on how well the negative class is classified. – Vikas NS Aug 27 '18 at 16:14
  • 1
    @VikasNS please read the answer closely; you cannot use AUROC as a *loss* function – desertnaut Oct 23 '19 at 11:46
  • For sake of mention, you could create a loss function out of recall and accuracy which are both between 0 and 1 and the goal to maximize. 2-recall-accuracy would work. However recall and accuracy must consider the whole validation data. Plus all the other issues that come along with convex functions and gradients, etc. – Gregory Morse May 28 '23 at 02:13

7 Answers7

12

You don't use precision or recall to be optimize. You just track them as valid scores to get the best weights. Do not mix loss, optimizer, metrics and other. They are not meant for the same thing.

THRESHOLD = 0.5
def precision(y_true, y_pred, threshold_shift=0.5-THRESHOLD):

    # just in case 
    y_pred = K.clip(y_pred, 0, 1)

    # shifting the prediction threshold from .5 if needed
    y_pred_bin = K.round(y_pred + threshold_shift)

    tp = K.sum(K.round(y_true * y_pred_bin)) + K.epsilon()
    fp = K.sum(K.round(K.clip(y_pred_bin - y_true, 0, 1)))

    precision = tp / (tp + fp)
    return precision


def recall(y_true, y_pred, threshold_shift=0.5-THRESHOLD):

    # just in case 
    y_pred = K.clip(y_pred, 0, 1)

    # shifting the prediction threshold from .5 if needed
    y_pred_bin = K.round(y_pred + threshold_shift)

    tp = K.sum(K.round(y_true * y_pred_bin)) + K.epsilon()
    fn = K.sum(K.round(K.clip(y_true - y_pred_bin, 0, 1)))

    recall = tp / (tp + fn)
    return recall


def fbeta(y_true, y_pred, beta = 2, threshold_shift=0.5-THRESHOLD):   
    # just in case 
    y_pred = K.clip(y_pred, 0, 1)

    # shifting the prediction threshold from .5 if needed
    y_pred_bin = K.round(y_pred + threshold_shift)

    tp = K.sum(K.round(y_true * y_pred_bin)) + K.epsilon()
    fp = K.sum(K.round(K.clip(y_pred_bin - y_true, 0, 1)))
    fn = K.sum(K.round(K.clip(y_true - y_pred, 0, 1)))

    precision = tp / (tp + fp)
    recall = tp / (tp + fn)

    beta_squared = beta ** 2
    return (beta_squared + 1) * (precision * recall) / (beta_squared * precision + recall) 


def model_fit(X,y,X_test,y_test):
    class_weight={
    1: 1/(np.sum(y) / len(y)),
    0:1}
    np.random.seed(47)
    model = Sequential()
    model.add(Dense(1000, input_shape=(X.shape[1],)))
    model.add(Activation('relu'))
    model.add(Dropout(0.35))
    model.add(Dense(500))
    model.add(Activation('relu'))
    model.add(Dropout(0.35))
    model.add(Dense(250))
    model.add(Activation('relu'))
    model.add(Dropout(0.35))
    model.add(Dense(1))
    model.add(Activation('sigmoid'))

    model.compile(loss='binary_crossentropy', optimizer='adamax',metrics=[fbeta,precision,recall])
    model.fit(X, y,validation_data=(X_test,y_test), epochs=200, batch_size=50, verbose=2,class_weight = class_weight)
    return model
Frayal
  • 2,117
  • 11
  • 17
  • here is [another method]: (https://stackoverflow.com/questions/42606207/keras-custom-decision-threshold-for-precision-and-recall/42607110), I don't know why these two code produce different results for the same threshold, and they are all different from the value I count with the predict result (while keras_metrics.precision() returns the correct answer for 0.5 threshold). – yang Mar 20 '19 at 04:13
  • 1
    [Recent research from 2017](https://arxiv.org/pdf/1608.04802.pdf) into this area has shown that it is possible to optimize statistics in the precision/recall family like precision-at-fixed-recall, etc. by use of new proxy loss functions. The authors reported relative improvement for the chosen metrics using the new proxy losses vs. the baseline loss functions. I wrote up an answer below to reflect these findings. It is exciting to see this field advancing so quickly! – J Trana Jan 18 '20 at 14:11
  • Why the beta in fbeta is hardcoded to two? Shouldn't it be a parameter ? – Javi Hernandez Aug 26 '20 at 19:23
9

No. To do a 'gradient descent', you need to compute a gradient. For this the function need to be somehow smooth. Precision/recall or accuracy is not a smooth function, it has only sharp edges on which the gradient is infinity and flat places on which the gradient is zero. Hence you can not use any kind of numerical method to find a minimum of such a function - you would have to use some kind of combinatorial optimization and that would be NP-hard.

jlanik
  • 859
  • 5
  • 12
  • Surrogate loss have been used for ages. Hinge loss and cross entropy are bounds for 1-0 classification. – dksahuji Oct 09 '22 at 03:39
6

As others have stated, precision/recall is not directly usable as a loss function. However, better proxy loss functions have been found that help with a whole family of precision/recall related functions (e.g. ROC AUC, precision at fixed recall, etc.)

The research paper Scalable Learning of Non-Decomposable Objectives covers this with a method to sidestep the combinatorial optimization by the use of certain calculated bounds, and some Tensorflow code by the authors is available at the tensorflow/models repository. Additionally, there is a followup question on StackOverflow that has an answer that adapts this into a usable Keras loss function.

Special thanks to Francois Chollet and other participants on the Keras issue thread here that turned up that research paper. You may also find that thread provides other useful insights into the problem at hand.

J Trana
  • 2,150
  • 2
  • 20
  • 32
1

Having the same problem with an unbalanced dataset, I'd suggest you use the F1 score as the metric of your optimizer. Andrew Ng teaches that having ONE metric for the model is the simplest (best?) way to train a model. If you have 2 metrics, like precision and recall - it's not clear which one is more important. Trying to set limits on one metric obviously impacts the other metric...

F1 score is the prodigy of recall and precision - it is their harmonic mean.

Keras that I'm using, unfortunately has no implementation of F1 score as a metric, like there is one for accuracy, or many other Keras metrics https://keras.io/api/metrics/.

I found an implementation of the F1 score as a Keras metric, used at each epoch at: https://medium.com/@aakashgoel12/how-to-add-user-defined-function-get-f1-score-in-keras-metrics-3013f979ce0d

I've implemented the simple function from the above article and the model trains now on F1 score as its Keras optimizer metric. Results on test: accuracy went down a bit and F1 score went up a lot.

Toren
  • 442
  • 3
  • 13
  • In your linked post you use F1 as a metric, *not* as loss (i.e. the quantity to be optimized), which is still the binary cross-entropy. Loss & metric are quite different things, and not to be confused (see the accepted answer for details). – desertnaut Dec 17 '20 at 01:45
0

I have the same problem regarding an unbalanced dataset for binary classification and I want to increase the recall sensitivity too. I found out that there is a built-in function for recall in tf.keras and can be used in the compile statement as follow:

   from tensorflow.keras.metrics import Recall, Accuracy   
   model.compile(loss='binary_crossentropy' , optimizer=opt, metrics=[Accuracy(),Recall()])
Dharman
  • 30,962
  • 25
  • 85
  • 135
A.Fuentes
  • 27
  • 7
  • 2
    The (whatever) metric metric chosen is irrelevant to what is being optimized, which is given by the `loss` argument (here binary cross-entropy). Loss & metric are completely different things, and they should not be confused (see the accepted answer for details). – desertnaut Dec 17 '20 at 01:47
0

The recommended approach to deal with an unbalanced dataset like you have is to use class_weights or sample_weights. See the model fit API for details.

Quote:

class_weight: Optional dictionary mapping class indices (integers) to a weight (float) value, used for weighting the loss function (during training only). This can be useful to tell the model to "pay more attention" to samples from an under-represented class.

With weights that are inversely proportional to the class frequency the loss will avoid just predicting the background class.

I understand that this is not how you formulated the question but imho it is the most practical approach to the issue you are facing.

Pedro Marques
  • 2,642
  • 1
  • 10
  • 10
0

I think that the Callbacks and Early Stopping mechanisms provide one with techniques that can lead you as close as possible to what you want to achieve. Please read the following article by Jason Brownlee about Early Stopping (read to the end!):

https://machinelearningmastery.com/how-to-stop-training-deep-neural-networks-at-the-right-time-using-early-stopping/

Maciej
  • 1
  • This does not provide an answer to the question. Once you have sufficient [reputation](https://stackoverflow.com/help/whats-reputation) you will be able to [comment on any post](https://stackoverflow.com/help/privileges/comment); instead, [provide answers that don't require clarification from the asker](https://meta.stackexchange.com/questions/214173/why-do-i-need-50-reputation-to-comment-what-can-i-do-instead). - [From Review](/review/late-answers/30031647) – Peacepieceonepiece Oct 09 '21 at 12:36