-1

I have a use case in ML where I have 2 classes, 0 and 1 for a given text.

  • Class-0: Can afford some misclassifications
  • Class-1: Very Important, can't afford any misclassifications

There's a huge imbalance in samples for both classes, about 30000 for class-0, and only 1000 for class-1

While doing the train-test split, I'm stratifying the split based on the labels, such that, the ratio of 70% train and 30% test is maintained for each label class.

I want to tune parameters in such a way that Precision or Recall for class-1 is improved. I tried using 'f1_macro', 'precision', 'recall' as individual metrics and all combined as well to tune using GridSearchCV, but it's less helpful due to majority samples being Class-0.

I'm exploring the safer ways to reduce class 0 data, although, there's only small degree we can reduce, anyways even without tuning, or with any parameters, class-0 always have above 98% f1-score.

So all I care about tuning is for class-1.

Can you please suggest, perhaps a customized callable metric such that it only focuses on Class-1's Precision, Recall or F1-Score?

I'm using scikit-learn latest stable version.

Similar Problem here, the author is trying to Tune Class-1's F1 Score using Neural Networks (MLP) in Keras
Its been suggested to try customizing metric, just didn't mention how.
The one who can answer here for Scikit-Learn, can also answer below link for Keras. Hyperparameter tuning in Keras (MLP) via RandomizedSearchCV

1 Answers1

0

Using class_weight='balanced' is helping here.

I referred these articles in Scikit-Learn's official documentation pages.

Understanding how parameter class_weights works:
https://scikit-learn.org/stable/modules/svm.html#unbalanced-problems https://stackoverflow.com/a/30982811/3149277

Understanding what parameters to use for class_weights:
https://scikit-learn.org/stable/modules/svm.html#tips-on-practical-use
How does the class_weight parameter in scikit-learn work?

Although, due to time limits, I didn't bother defining the custom function as this seemed working close to my expectations.