structuring a large python repository, to not import everything

Question

I'm having an issue managing imports with a big software repo that we have. For sake of clarity, let's pretend the repo looks something like this:

repo/
    __init__.py
    utils/
         __init__.py
         math.py
         readers.py             
         ...
    ...

Now our __init__.py files are setup so that we can do something like this

from repo.utils import IniReader

In this example repo/utils/__init__.py would have

from .readers import IniReader, DatReader

This structure has worked out well for us from a readability standpoint, but we are now facing issues when trying to deploy applications.

The issue is this... let's pretend I'm writing an app that looks like this:

from repo.utils import IniReader
if __name__ == '__main__':
    r = IniReader('blah.ini')
    print(r.fields)

Now the from repo.utils import IniReader will execute repo/utils/__init__.py which in this case will import IniReader and DatReader. Let's pretend that DatReader looks something like this:

import numpy as np
import scipy
import tensorflow
from .math import transform

class DatReader():
...

which adheres to PEP8, with all the imports at the top of the file.

The problem here is that DatReader requires some heavyweight imports (e.g. numpy, scipy, tensorflow are huge libraries). To make matters worse, the from .math import transform might have something like from repo.contrib import lookup which then hits the repo/contrib/__init__.py which starts a chain reaction and ends up importing our entire repository.

This really hasn't been a problem for all of us developers with a full development environment stood up, but now that we're trying to ship applications (internally) this import hell is becoming an issue.

Is there a standard solution to this problem? We've talked about just keeping the __init__.py empty, or just not having all the imports at the top of a file as PEP8 states. Both of these solutions come with compromises, so if anyone has suggestions or references, I'd love to hear it.

Thanks!

Xukrao · Accepted Answer · 2018-11-10T23:43:20.003

It might be helpful to take a step back for a brief moment and look at the fundamental issue that you seem to be faced with, namely: "How do I deal with missing python packages on users' machines?"

Basically there are two categories of solutions to this problem:

Help to make the missing packages available on the user's machine.
- You could distribute your code as a package that users can install with pip. Just include dependency specifications in your distributed package, and pip will offer users to automatically download and install any missing packages.
- You could freeze your code, i.e. convert your code to a self-standing application that already includes all the required packages.
Divide your package dependencies into mandatory and optional ones, and adapt your code such that the absence of an optional package doesn't cause all of the code to break.
- As you already noted, you could sanitize the module-level imports (i.e. imports in __init__.py files) such that optional packages are not loaded 'prematurely'. In your case that would mean removing the DatReader imports.
- As you also already noted, you could move optional package imports inside the classes or functions that need them. Style-wise this is not really optimal, but the code itself will still be perfectly valid. It normally doesn't matter that the import statements will get executed again every time when the function is run, because the actual import will still only take place once.
- You could wrap the imports of the optional packages into try-except clauses. This will prevent any import errors from occurring (though of course you'll still encounter an error once you try to run a class or function that depends upon the missing package).

Example of an import in try-except clause:

import warnings
try:
    import scipy
except ImportError:
    warnings.warn("The python package `scipy` could not be imported. As a result "
                  "the class `repo.utils.DatReader` will not be functional.")

Now to come back again to your original question "Is there a standard solution to this problem?": I'd say no. There's no single golden bullet. All solutions come with their own advantages and disadvantages, and you'll have to decide which solution is the optimal one for your specific situation.

structuring a large python repository, to not import everything

1 Answers1