0

If I want to optimise the regularisation parameter for a logistic regression model (for example) based on area under the ROC curve, I can use GridSearchCV for a suitable range of parameters and set scoring='roc_auc'.

This can be done using from sklearn.model_selection import GridSearchCV, and there is no need to include from sklearn.metrics import roc_auc_score.

However, if I want to calculate the area under the ROC curve manually for a particular fitted dataset then I do need to include from sklearn.metrics import roc_auc_score.

  • How does this work? I assume that by importing GridSearchCV we are somehow importing roc_auc_score behind the scenes? Unfortunately I can't seem to follow this through in the source code - I'd really appreciate an explanation.
  • If this is the case, does it also mean that by importing GridSearchCV we end up importing all possible scoring methods behind the scenes?
  • Why then can I not use roc_auc_score "manually" myself if I have imported GridSearchCV only and not roc_auc_score itself? Is it not implicitly "there" behind the scenes?

I appreciate this may be a more general question about python importing and not specific to scikit-learn...

Nick
  • 366
  • 1
  • 4
  • 10
  • 2
    If I understand you correctly, you just need to read up on how Python does modules and imports. https://docs.python.org/3/tutorial/modules.html `roc_auc_score` may be imported by `GridSearchCV` but it will be local to that, not in the global namespace. – Denziloe Aug 30 '18 at 13:19
  • 1
    Thanks for the link - if I am understanding it correctly I think that `roc_auc_score` is probably being imported somehow but not being added to my global symbol table - but would be good to have my understanding confirmed! – Nick Aug 30 '18 at 13:34
  • This will make your path to understanding a lot easier: `dir()`. – Denziloe Aug 30 '18 at 15:18

1 Answers1

3

GridSearchCV extends BaseSearchCV class. This means that it will be using the fit() function defined in BaseSearchCV.

So now as you can see in source code here:

    ...
    ...
    scorers, self.multimetric_ = _check_multimetric_scoring(
    self.estimator, scoring=self.scoring)
    ...
    ...

It checks all the parameters supplied during the construction of GridSearchCV here. For 'scoring' param, its calling a method _check_multimetric_scoring(). Now on top of this file, you will see many imports.

The method _check_multimetric_scoring points to scorer.py file:

Similarly tracing the method calls, we will reach here:

SCORERS = dict(explained_variance=explained_variance_scorer,
               r2=r2_scorer,
               neg_median_absolute_error=neg_median_absolute_error_scorer,
               neg_mean_absolute_error=neg_mean_absolute_error_scorer,
               neg_mean_squared_error=neg_mean_squared_error_scorer,
               neg_mean_squared_log_error=neg_mean_squared_log_error_scorer,
               accuracy=accuracy_scorer, roc_auc=roc_auc_scorer,
               ...
               ...
 ...
 ...

Looking at roc_auc, we will reach here:

roc_auc_scorer = make_scorer(roc_auc_score, greater_is_better=True,
needs_threshold=True)

Now look at the parameters here, roc_auc_score is sent to make_scorer. So from where it is imported? Look at the top of this file and you will see this:

from . import (r2_score, median_absolute_error, mean_absolute_error,
               mean_squared_error, mean_squared_log_error, accuracy_score,
               f1_score, roc_auc_score, average_precision_score,
               precision_score, recall_score, log_loss,
               balanced_accuracy_score, explained_variance_score,
               brier_score_loss)

So from here, the actual scoring object is returned to the GridSearchCV.

Now, the library is using relative and absolute imports, and as @Denziloe correctly said, those imports are local for that module, not the global imports.

See these answers for more information on import scope and namespaces:

And this python documentation page

Vivek Kumar
  • 35,217
  • 8
  • 109
  • 132
  • Thanks - this is very helpful. I assume that when you say "tracing the method calls...", the series of calls you are referring to is `_check_multimetric_scoring` -> `check_scoring` -> `get_scorer` -> `SCORERS`. Also it took me a few minutes to realise that `from . import` can refer to other modules at the same level and that `roc_auc_score` is actually defined in the sibling module `ranking.py` (unless I totally misread the code) – Nick Aug 30 '18 at 14:05
  • @Nick Yes, you are correct. Just like linux commands, `from . import` refers to files in same folder, and `from .. import` goes to root level. – Vivek Kumar Aug 30 '18 at 14:06
  • 1
    Yes, but actually methods within files in the same folder, which is what caught me out. Actually [note to future readers!] the links you provided to other questions were very helpful as it is not always easy to find this information without knowing the right terms to search on. – Nick Aug 30 '18 at 14:09
  • No, I think you got it wrong. I think I explained it wrong. `from . import` will be only successful for imports declared in `__all__` under `_init_.py` file in the folder. All imports made in that file are available directly using `from . import`. – Vivek Kumar Aug 30 '18 at 14:11
  • In the previous comment I was explaining about relative imports of the type: `from .ranking import` and `from ..model_selection import` type of things. Sorry to confuse you. – Vivek Kumar Aug 30 '18 at 14:12