I have a use case in ML where I have 2 classes, 0 and 1 for a given text.
Class-0:
Can afford some misclassificationsClass-1:
Very Important, can't afford any misclassifications
There's a huge imbalance in samples for both classes, about 30000 for class-0, and only 1000 for class-1
While doing the train-test split, I'm stratifying the split based on the labels, such that, the ratio of 70% train and 30% test is maintained for each label class.
I want to tune parameters in such a way that Precision
or Recall
for class-1 is improved. I tried using 'f1_macro', 'precision', 'recall' as individual metrics and all combined as well to tune using GridSearchCV, but it's less helpful due to majority samples being Class-0.
I'm exploring the safer ways to reduce class 0 data, although, there's only small degree we can reduce, anyways even without tuning, or with any parameters, class-0 always have above 98% f1-score.
So all I care about tuning is for class-1
.
Can you please suggest, perhaps a customized callable metric such that it only focuses on Class-1's Precision, Recall or F1-Score?
I'm using scikit-learn latest stable version.
Similar Problem here, the author is trying to Tune Class-1's F1 Score using Neural Networks (MLP)
in Keras
Its been suggested to try customizing metric, just didn't mention how.
The one who can answer here for Scikit-Learn, can also answer below link for Keras.
Hyperparameter tuning in Keras (MLP) via RandomizedSearchCV