17

I have an imbalanced dataset containing a binary classification problem. I have built Random Forest Classifier and used k-fold cross-validation with 10 folds.

kfold = model_selection.KFold(n_splits=10, random_state=42)
model=RandomForestClassifier(n_estimators=50) 

I got the results of the 10 folds

results = model_selection.cross_val_score(model,features,labels, cv=kfold)
print results
[ 0.60666667  0.60333333  0.52333333  0.73        0.75333333  0.72        0.7
  0.73        0.83666667  0.88666667]

I have calculated accuracy by taking mean and standard deviation of the results

print("Accuracy: %.3f%% (%.3f%%)") % (results.mean()*100.0, results.std()*100.0)
Accuracy: 70.900% (10.345%)

I have computed my predictions as follows

predictions = cross_val_predict(model, features,labels ,cv=10)

Since this is an imbalanced dataset, I would like to calculate the precision, recall, and f1 score of each fold and average the results. How to calculate the values in python?

Julia Meshcheryakova
  • 3,162
  • 3
  • 22
  • 42
Jayashree
  • 811
  • 3
  • 13
  • 28

2 Answers2

40

When you use cross_val_score method, you can specify, which scorings you can calculate on each fold:

from sklearn.metrics import make_scorer, accuracy_score, precision_score, recall_score, f1_score

scoring = {'accuracy' : make_scorer(accuracy_score), 
           'precision' : make_scorer(precision_score),
           'recall' : make_scorer(recall_score), 
           'f1_score' : make_scorer(f1_score)}

kfold = model_selection.KFold(n_splits=10, random_state=42)
model=RandomForestClassifier(n_estimators=50) 

results = model_selection.cross_val_score(estimator=model,
                                          X=features,
                                          y=labels,
                                          cv=kfold,
                                          scoring=scoring)

After cross validation, you will get results dictionary with keys: 'accuracy', 'precision', 'recall', 'f1_score', which store metrics values on each fold for certain metric. For each metric you can calculate mean and std value by using np.mean(results[value]) and np.std(results[value]), where value - one of your specified metric name.

Eduard Ilyasov
  • 3,268
  • 2
  • 20
  • 18
  • How to calculate training and testing error for each fold? – Jayashree Oct 09 '17 at 04:26
  • cross_val_score calculates metrics values on validation data only. But you can make two custom iterators. First iterator will yields to you train objects positional indices and instead of validation positional indices yields same train objects positional indices of your features DataFrame. Second iterator will yields to you train objects positional indices same as in first iterator, but instead of val positional indices yields remaining object's positional indices of your features DataFrame. – Eduard Ilyasov Oct 09 '17 at 04:53
  • After cross_val_score with custom first cv you'l get metrics values on train set and after cross_val_score with custom second cv you'l get metrics values on validation set. – Eduard Ilyasov Oct 09 '17 at 04:53
  • 13
    For version 0.19, it should be `model_selection.cross_validate` and not `model_selection.cross_val_score`. – ankurrc Feb 11 '18 at 00:32
  • is that a good approach to use cross_validate_predict with any of these whether .cross_validate or cross_val_score?? because according to sklearn documentation "Passing these predictions into an evaluation metric may not be a valid way to measure generalization performance. Results can differ from cross_validate and cross_val_score unless all tests sets have equal size and the metric decomposes over samples." – Nimra Nov 26 '22 at 21:16
1

All of the scores you mentioned — accuracy, precision, recall and f1 — rely on the threshold you (manually) set for the prediction to predict the class. If you don’t specify a threshold, the default threshold is 0.5, see here. The threshold should always be set according to the cost of misclassification. If no cost is given, you should make an assumption.

In order to be able to compare different models or hyperparameters, you might consider using the Area Under Curve (AUC) for the Precision Recall Curve since it is independent of the threshold by showing precision and recall for different thresholds. In your specific case of imbalanced data, the PR-AUC is more appropriate then the AUC for the ROC, see here.

See also here: https://datascience.stackexchange.com/a/96708/131238

DataJanitor
  • 1,276
  • 1
  • 8
  • 19