Why is scorer import unnecessary for parameter search in scikit-learn?

Question

If I want to optimise the regularisation parameter for a logistic regression model (for example) based on area under the ROC curve, I can use GridSearchCV for a suitable range of parameters and set scoring='roc_auc'.

This can be done using from sklearn.model_selection import GridSearchCV, and there is no need to include from sklearn.metrics import roc_auc_score.

However, if I want to calculate the area under the ROC curve manually for a particular fitted dataset then I do need to include from sklearn.metrics import roc_auc_score.

How does this work? I assume that by importing GridSearchCV we are somehow importing roc_auc_score behind the scenes? Unfortunately I can't seem to follow this through in the source code - I'd really appreciate an explanation.
If this is the case, does it also mean that by importing GridSearchCV we end up importing all possible scoring methods behind the scenes?
Why then can I not use roc_auc_score "manually" myself if I have imported GridSearchCV only and not roc_auc_score itself? Is it not implicitly "there" behind the scenes?

I appreciate this may be a more general question about python importing and not specific to scikit-learn...

If I understand you correctly, you just need to read up on how Python does modules and imports. https://docs.python.org/3/tutorial/modules.html `roc_auc_score` may be imported by `GridSearchCV` but it will be local to that, not in the global namespace. — Denziloe, Aug 30 '18 at 13:19
Thanks for the link - if I am understanding it correctly I think that `roc_auc_score` is probably being imported somehow but not being added to my global symbol table - but would be good to have my understanding confirmed! — Nick, Aug 30 '18 at 13:34
This will make your path to understanding a lot easier: `dir()`. — Denziloe, Aug 30 '18 at 15:18

Vivek Kumar · Accepted Answer · 2018-08-30T13:49:31.403

GridSearchCV extends BaseSearchCV class. This means that it will be using the fit() function defined in BaseSearchCV.

So now as you can see in source code here:

    ...
    ...
    scorers, self.multimetric_ = _check_multimetric_scoring(
    self.estimator, scoring=self.scoring)
    ...
    ...

It checks all the parameters supplied during the construction of GridSearchCV here. For 'scoring' param, its calling a method _check_multimetric_scoring(). Now on top of this file, you will see many imports.

The method _check_multimetric_scoring points to scorer.py file:

Similarly tracing the method calls, we will reach here:

SCORERS = dict(explained_variance=explained_variance_scorer,
               r2=r2_scorer,
               neg_median_absolute_error=neg_median_absolute_error_scorer,
               neg_mean_absolute_error=neg_mean_absolute_error_scorer,
               neg_mean_squared_error=neg_mean_squared_error_scorer,
               neg_mean_squared_log_error=neg_mean_squared_log_error_scorer,
               accuracy=accuracy_scorer, roc_auc=roc_auc_scorer,
               ...
               ...
 ...
 ...

Looking at roc_auc, we will reach here:

roc_auc_scorer = make_scorer(roc_auc_score, greater_is_better=True,
needs_threshold=True)

Now look at the parameters here, roc_auc_score is sent to make_scorer. So from where it is imported? Look at the top of this file and you will see this:

from . import (r2_score, median_absolute_error, mean_absolute_error,
               mean_squared_error, mean_squared_log_error, accuracy_score,
               f1_score, roc_auc_score, average_precision_score,
               precision_score, recall_score, log_loss,
               balanced_accuracy_score, explained_variance_score,
               brier_score_loss)

So from here, the actual scoring object is returned to the GridSearchCV.

Now, the library is using relative and absolute imports, and as @Denziloe correctly said, those imports are local for that module, not the global imports.

See these answers for more information on import scope and namespaces:

And this python documentation page

Thanks - this is very helpful. I assume that when you say "tracing the method calls...", the series of calls you are referring to is `_check_multimetric_scoring` -> `check_scoring` -> `get_scorer` -> `SCORERS`. Also it took me a few minutes to realise that `from . import` can refer to other modules at the same level and that `roc_auc_score` is actually defined in the sibling module `ranking.py` (unless I totally misread the code) — Nick, Aug 30 '18 at 14:05
@Nick Yes, you are correct. Just like linux commands, `from . import` refers to files in same folder, and `from .. import` goes to root level. — Vivek Kumar, Aug 30 '18 at 14:06
Yes, but actually methods within files in the same folder, which is what caught me out. Actually [note to future readers!] the links you provided to other questions were very helpful as it is not always easy to find this information without knowing the right terms to search on. — Nick, Aug 30 '18 at 14:09
No, I think you got it wrong. I think I explained it wrong. `from . import` will be only successful for imports declared in `__all__` under `_init_.py` file in the folder. All imports made in that file are available directly using `from . import`. — Vivek Kumar, Aug 30 '18 at 14:11
In the previous comment I was explaining about relative imports of the type: `from .ranking import` and `from ..model_selection import` type of things. Sorry to confuse you. — Vivek Kumar, Aug 30 '18 at 14:12

Why is scorer import unnecessary for parameter search in scikit-learn?

1 Answers1