12

When a package becomes large, it can be hard to remember where things are, and dot-paths to the object we want can be cumbersome. One way that authors seem to address this is to bring references to the "best of" objects to the top, though there code can actually live several package levels below.

This allows one to say:

from pandas import wide_to_long

instead of

from pandas.core.reshape.melt import wide_to_long

But what are the ins and outs of doing this, and the best practices around the method? Doesn't loading the top __init__.py with many imports (in order to make them available at the top level) mean that any import of a single object of the package suddenly takes much more memory than needed - since everything mentioned in the __init__.py is automatically loaded?

Yet, packages do it. See, for example, what can be imported from top level numpy or pandas below (code to run your own diagnosis can be found in this gist).

$ python print_top_level_diagnosis.py numpy
--------- numpy ---------
599 objects can be imported from top level numpy:
  19 modules
  300 functions
  104 types

depth   count
0   162
1   406
2   2
3   29
4   1

$ python print_top_level_diagnosis.py pandas
--------- pandas ---------
115 objects can be imported from top level pandas:
  12 modules
  55 functions
  40 types

depth   count
0   12
3   37
4   65
5   1

thorwhalen
  • 1,920
  • 14
  • 26
  • 2
    I think it depends on the project. Some simple ones may need to load resources at the entry-level init but other larger projects should probably remain modular and only import when needed or have the developer import a module within the package like package.module1.module2. Django is pretty big and it uses the many dot functionality. I prefer this because it means I know where I'm importing from – bherbruck May 14 '20 at 15:36
  • Related: [Can I put a class definition into __init__.py?](https://stackoverflow.com/q/47740935/6862601) – codeforester Mar 05 '21 at 03:14
  • 2
    Memory is really not a concern here. Usually, in any case, most of a module/package is loaded into memory, you can't just pick and choose single objects. What changes is *what objects are available in what namespace* – juanpa.arrivillaga Mar 05 '21 at 03:32
  • 2
    Some of the best practices are discussed in this answer by the most amazing Aaron Hall: [Can someone explain __all__ in Python?](https://stackoverflow.com/a/35710527/6862601). – codeforester Mar 05 '21 at 05:42

1 Answers1

0

All methods have pros and cons and in __init__.py you pretty much do what you want. Having said that, an important guideline is that __init__.py should never have any "side effects". This means that it shouldn't do anything else than declaring the namespace. Having side effects can really confuse the users of your library or even become a security risk.

Example __init__.py in module called "mylibrary":

import requests

result = requests.get("URL TO SOME VIRUS")
...

In this example, running import mylibrary actually downloads something (or sends information).

If you leave __init__.py free of any side effects there are a few considerations. The first is load times. Every time your library is imported Python will go through, and run(!), all __init__.py files it can find. This means that if your file contains a slow operation, like for example loading a large file or a large number of external dependencies, your library can actually become really slow to load. You can work around this by wrapping expensive calls in a function and exposing the function, but it's still something to consider.

Another thing to consider is versioning of your library. If you move a file to a different folder, do all of your users or yourself need to update all import statements? If you look at the pandas library you'll see that they created api submodules (e.g. pandas.core.api) which contain all the public functions/classes/etc. All imports are done from the api submodules and this ensures that after a change, only 1 location needs to update the import statements and everything will keep working.

Gijs Wobben
  • 1,974
  • 1
  • 10
  • 13