2

My question is specific to scikit-learn python module, but I had similar issues with matplotlib as well.

When I want to use sklearn, if I just do 'import sklearn' and then call whatever submodule I need, like ' sklearn.preprocessing.scale()', I get an error "AttributeError: 'module' object has no attribute 'preprocessing'"

On the other hand, when I do 'from sklearn import preprocessing' and then use 'preprocessing.scale()' it works normally.

When I use other modules like Numpy, it is sufficient to just 'import numpy' and it works well.

Therefore, I would like to ask if anyone can tell me why is this happening and if I am doing something wrong?

Thanks.

El Rakone
  • 141
  • 1
  • 12
  • [This answer](http://stackoverflow.com/a/9049246/4014959) may be helpful. – PM 2Ring Jul 01 '16 at 11:25
  • The proposed duplicate, http://stackoverflow.com/questions/9048518/importing-packages-in-python, is asking about 'how do I structure my own package to import all'. Here the question is about the behavior of `scikit-learn`. – hpaulj Jul 01 '16 at 16:11
  • http://stackoverflow.com/questions/27744767/differences-in-importing-modules-subpackages-of-numpy-and-scipy-packages is a better duplicate, since it focuses on the difference between `numpy` and `scipy`. `scipy` like `scikit-learn` requires importing submodules individually. – hpaulj Jul 01 '16 at 16:14

2 Answers2

2

A python package is defined in the __init__.py file inside the directory. This file determines whether or not submodules are include.

When you do import sklearn python finds the file sklearn/__init__.py and executes it to create the sklearn module. This object is the bound to the name sklearn. Submodules are not implicitly imported by the interpreter.

However when doing from sklearn import preprocessing python will first load the sklearn module as before. Then it will check if preprocessing is an attribute of that module (e.g. a function), and if not it will look for the file sklearn/preprocessing.py and improt that module too.

It happens that numpy does something like the following in its __init__.py file:

import .random

Thus when importing numpy as import numpy the execution of that module triggers the importing of numpy.random which is then added as an attribute.


This is useful because sometimes you want to use only part of a package and loading all of it could take a significant amount of time. For example importing numpy does take something like half a second. This is time wasted if you only need a very small subset of its functionality.


You may be interested in reading the documentation for packages.

Bakuriu
  • 98,325
  • 22
  • 197
  • 231
1

Numpy conveniently imports its submodules in its __init__.py file and adds them to __all__. There's not much you can do about it when using a library - it either does it or not. sklearn apparently doesn't.

poe123
  • 1,188
  • 8
  • 11