14

If I do this:

import lxml 

in python, lxml.html is not imported. For instance, I cannot call the lxml.html.parse() function. Why is this so?

dangerChihuahua007
  • 20,299
  • 35
  • 117
  • 206
  • 3
    You may notice that `import os` allows you to use `os.path.whatever` in your code. So, it's up to the module author. – Greg Hewgill Feb 15 '12 at 23:34

5 Answers5

15

Importing a module or package in Python is a conceptually simple operation:

  1. Find the .py file corresponding to the import. This involves the Python path and some other machinery, but will result in a specific .py file being found.

  2. For every directory level in the import (import foo.bar.baz has two levels), find the corresponding __init__.py file, and execute it. Executing it simply means running all the top-level statements in the file.

  3. Finally, the .py file itself (foo/bar/baz.py in this case) is executed, meaning all the top-level statements are executed. All the globals created as a result of that execution are bundled into a module object, and that module object is the result of the import.

If none of those steps imported sub-packages, then those sub-packages aren't available. If they did import sub-packages, then they are available. Package authors can do as they wish.

Ned Batchelder
  • 364,293
  • 75
  • 561
  • 662
  • 1
    I think it's worth clarifying that when importing a package, the .py file that is found in step 1 is the \_\_init\_\_.py for the package. Unless subpackages are imported in that \_\_init\_\_.py file, they will not be available. – C S Jul 03 '17 at 06:13
  • Isn't `from foo.bar import baz` exempt from this rule? At least as far as my projects go all such imports work even if all the `__init__.py`'s are empty. Please also look at [a question I posted](https://stackoverflow.com/questions/57772706/importing-deeply-nested-modules-in-python) – z33k Sep 03 '19 at 13:25
6

lxml is called a package in Python, which is a hierachical collections of modules. Packages can be huge, so they are allowed to be selective about what is pulled in when they are imported. Otherwise everybody would have to import the full hierarchy, which would be quite a waste of resources.

Sven Marnach
  • 574,206
  • 118
  • 941
  • 841
4

It's by design. The package has the option to import the nested package in its __init__.py, then, you would be able to access the nested package without problems. It's a matter of choice for the package writer, and the intent is to minimize the amount of code that you probably won't use.

Irfy
  • 9,323
  • 1
  • 45
  • 67
2

lxml is a package, not a module. A package is a collection of modules. As it happens, you can also import the package directly, but that doesn't automatically import all of its submodules.

As to why this is, well, that's a question for the BDFL. I think it's probably because packages are generally quite large, and importing all the submodules would be an excessive performance penalty.

Katriel
  • 120,462
  • 19
  • 136
  • 170
1

This is to allow only the minimum amount of code to have to be loaded for multi-part libraries that you may not use the entirety of. For instance, you might not be using the html part of lxml, and thus not want to have to deal with loading its code.

Amber
  • 507,862
  • 82
  • 626
  • 550