0

When using scikit-learn's packages which has the option to choose a value for n_jobs for parallel processing, do we need to first import joblib.Parallel or will the scikit-learn package work with parallel processing without needing to first import joblib.Parallel.

Some of scikit-learn packages which has parallel processing are:

  • sklearn.linear_model.LogisticRegression

  • xgboost.XGBRegressor

  • xgboost.XGBClassifier

etc.

Leockl
  • 1,906
  • 5
  • 18
  • 51
  • 1
    No, you don't have to import `joblib` – Chris Mar 17 '20 at 07:14
  • Thanks @Chris. Looking at the source code of the various scikit-learn packages which has the option for parallel processing, how does the packages locate `joblib` if we didn't import `joblib` in the first place? Many thanks. – Leockl Mar 17 '20 at 08:03

1 Answers1

1

Q : "... do we need to first import joblib.Parallel or will the scikit-learn package work with parallel processing without needing to first import joblib.Parallel." ?

The B is correct : scikit-learn will work, as it was properly designed and implemented, so as to manage its own internal needs to import whatever package it knowingly depends on. This is a professional software standard to take due care of its own internal dependencies, isn't it?


BONUS PART for those indeed interested in understanding WHY :

Those who indeed want to see the trick - check the way of inheritance done on one of the packages source-level, obvious from the files reported below :

(base) Tue Mar 17 12:00:34 a64FX:~$ grep -R "joblib.Parallel" /home/r2d2/anaconda2/pkgs/scikit-learn-0.20.0-py35h4989274_1/lib/python3.5/site-packages/
            /home/r2d2/anaconda2/pkgs/scikit-learn-0.20.0-py35h4989274_1/lib/python3.5/site-packages/sklearn/decomposition/online_lda.py:        parallel : joblib.Parallel (optional)
            /home/r2d2/anaconda2/pkgs/scikit-learn-0.20.0-py35h4989274_1/lib/python3.5/site-packages/sklearn/decomposition/online_lda.py:            Pre-initialized instance of joblib.Parallel.
            /home/r2d2/anaconda2/pkgs/scikit-learn-0.20.0-py35h4989274_1/lib/python3.5/site-packages/sklearn/decomposition/online_lda.py:        parallel : joblib.Parallel
            /home/r2d2/anaconda2/pkgs/scikit-learn-0.20.0-py35h4989274_1/lib/python3.5/site-packages/sklearn/decomposition/online_lda.py:            Pre-initialized instance of joblib.Parallel
Binary file /home/r2d2/anaconda2/pkgs/scikit-learn-0.20.0-py35h4989274_1/lib/python3.5/site-packages/sklearn/decomposition/__pycache__/online_lda.cpython-35.pyc matches
            /home/r2d2/anaconda2/pkgs/scikit-learn-0.20.0-py35h4989274_1/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py:                print("Using %s as joblib.Parallel backend instead of %s "
            /home/r2d2/anaconda2/pkgs/scikit-learn-0.20.0-py35h4989274_1/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py:    """Callback used by joblib.Parallel's multiprocessing backend.
            /home/r2d2/anaconda2/pkgs/scikit-learn-0.20.0-py35h4989274_1/lib/python3.5/site-packages/sklearn/externals/joblib/_dask.py:        joblib.Parallel will never access those results
            /home/r2d2/anaconda2/pkgs/scikit-learn-0.20.0-py35h4989274_1/lib/python3.5/site-packages/sklearn/externals/joblib/_dask.py:        # See 'joblib.Parallel.__call__' and 'joblib.Parallel.retrieve' for how
Binary file /home/r2d2/anaconda2/pkgs/scikit-learn-0.20.0-py35h4989274_1/lib/python3.5/site-packages/sklearn/externals/joblib/__pycache__/parallel.cpython-35.pyc matches
Binary file /home/r2d2/anaconda2/pkgs/scikit-learn-0.20.0-py35h4989274_1/lib/python3.5/site-packages/sklearn/externals/joblib/__pycache__/_dask.cpython-35.pyc matches
user3666197
  • 1
  • 6
  • 50
  • 92
  • Many thanks for your help here @user3666197. The key take-out here is scikit-learn packages have built-in inheritance from scikit-learn's baseestimator – Leockl Mar 25 '20 at 10:31