Parameters for sklearn average precision score when using Random Forest

Question

I have been trying to fiddle with sklearn metrics, particularly average_precision_score. However, I could only find one example of computing average_precision_score in the sklearn documentation and that too using SVM. Underneath is the code snippet and also the link to the documentation as a reference:

Documentation - Precision Recall Sklearn and code reference

# Create a simple classifier
classifier = svm.LinearSVC(random_state=random_state)
classifier.fit(X_train, y_train)
y_score = classifier.decision_function(X_test)
# Computing the avaerage_precision_score
from sklearn.metrics import average_precision_score
average_precision = average_precision_score(y_test, y_score)

Now my question is in the case above y_score is the output coming through decision_function (which predicts the confidence scores of samples) and y_test are classes. Considering there is no decision_function method for RandomForests as in the case of SVM, how to go about calculating y_score?

I have tried and seen people using both predict(self, X)(Predict class for X) and predict_proba(self, X)(Predict class probabilities for X) methods to compute average_precision_score. And my results have been very different using both these methods. With predict(self, X) I get 0.74 as average_precision_score and using predict_proba I get 0.94. My y_test are class labels with values (1, 0). I am a little confused as to what is the right thing. When to use predict vs predict_proba, and why are they resulting in so much different average precision scores. Any help would be highly appreciated.

score 0 · Accepted Answer · answered Oct 25 '20 at 17:47

With predict - you predict labels (assume it's 0 and 1) With predict_proba - you have probabilities of both labels for certain sample (result [0.1, 0.9] => this sample most likely 1 not 0).

The metric AP - is the order metric so the only order of prediction is important. So if ground truth vector [1, 0, 1] and prediction vector (via probabilities) [0.9, 0.7, 0.8] - AP give you 1.0 even probability to second example as 1 is 0.7.

Thus if you model make a mistake (missing the class) it's still a chance that the order will be more smooth. Thus for order metric like AP or ROC AUC it's better to put proba over predict.

Parameters for sklearn average precision score when using Random Forest

1 Answers1