23

Does GridSearchCV use predict or predict_proba, when using auc_score as score function?

The predict function generates predicted class labels, which will always result in a triangular ROC-curve. A more curved ROC-curve is obtained using the predicted class probabilities. The latter one is, as far as I know, more accurate. If so, the area under the 'curved' ROC-curve is probably best to measure classification performance within the grid search.

Therefore I am curious if either the class labels or class probabilities are used for the grid search, when using the area under the ROC-curve as performance measure. I tried to find the answer in the code, but could not figure it out. Does anyone here know the answer?

Thanks

Bastiaan van den Berg
  • 1,585
  • 1
  • 14
  • 20

2 Answers2

32

To use auc_score for grid searching you really need to use predict_proba or decision_function as you pointed out. This is not possible in the 0.13 release. If you do score_func=auc_score it will use predict which doesn't make any sense.

[edit]Since 0.14[/edit] it is possible to do grid-search using auc_score, by setting the new scoring parameter to roc_auc: GridSearch(est, param_grid, scoring='roc_auc'). It will do the right thing and use predict_proba (or decision_function if predict_proba is not available). See the whats new page of the current dev version.

You need to install the current master from github to get this functionality or wait until April (?) for 0.14.

Andreas Mueller
  • 27,470
  • 8
  • 62
  • 74
  • Thanks for the answer. I will install the current master from github to get the desired functionality. – Bastiaan van den Berg Feb 20 '13 at 11:37
  • And for custom functions?, I mean I want to use a scoring function which scores with ground trouth and a predict_proba y matrix. – avances123 Dec 18 '14 at 16:39
  • 2
    Somehow overlooked your question avances123. Look at the "defining your own scoring function" documentation. You can provide any callable with signature ``myfunc(estimator, X_test, y_test)`` – Andreas Mueller Mar 30 '15 at 02:46
  • @AndreasMueller do you have any insight on this question? http://stackoverflow.com/questions/43377189/how-to-use-log-loss-in-gridsearchcv-with-multi-class-labels-in-scikit-learn – O.rka Apr 12 '17 at 18:40
1

After performing some experiments with Sklearn SVC (which has predict_proba available) comparing some results with predict_proba and decision_function, it seems that roc_auc in GridSearchCV uses decision_function to compute AUC scores. I found a similar discussion here: Reproducing Sklearn SVC within GridSearchCV's roc_auc scores manually