1

I have a submodule I'm vendoring in which is poorly written (it does not use relative imports but has two packages. To be clear, I'm not installing these packages (because they don't have well maintained packages on pypi, just including the source code.)

So the layout looks like the following:

root
  |
  |-code
  |   |-file.py
  |
  |-vendor
      |-submodule
           |-package_1
           |    |-alpha.py
           |-package_2
                |-beta.py

Unfortunately, beta.py tries to import package_1 which doesn't work because there's no __init__.py. Because I'm pulling all submodules fresh during CI/CD, this lack of relative imports breaks my tests.

This would work if everything is at the root directory, but I can't control the submodules. I also don't want to change alpha.py or beta.py because I don't want to deal with forks.

Is there any way to have a universal __init__.py or some equivalent so that when beta.py imports, it sees package_1 and alpha.py?

aronchick
  • 6,786
  • 9
  • 48
  • 75

1 Answers1

2

__init__.py or lack thereof is irrelevant for whether something counts as a package on modern (3.3+) Python due to PEP 420's Implicit Namespace Packages.

Something tells me the vendor packages expects the directory containing both of their packages (root/vendor/submodule in this case) to be added to sys.path by some mechanism (possibly just be being installed at top level in site-packages, not in some nested directory as you have it set up); if your repackaging fails to put them in the root of some entry in sys.path, nor updates sys.path to include wherever you did put them, then their packages won't work.

The solution is to install their packages to a sys.path location or update sys.path to include their location, so package_1 is a top-level package as package_2 expects.

Assuming these vendor modules are only used in file.py, you could have the import of file.py manually update sys.path with code like this:

import pathlib
import sys

# Get the path to file.py, go up to the parent directory, then down to 
# vendor/submodule
vendor_dir = pathlib.Path(__file__).parent.parent / 'vendor' / 'submodule'
sys.path.append(str(vendor_dir))

# import package_1.alpha or package_2.beta will now work
ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
  • I think this is exactly right! So how do I do this? Just put sys.path.append("vendor/submodule") at the top of every file that imports this? Or is there a global place to append this to sys.path? – aronchick Sep 25 '21 at 14:54
  • @aronchick: It's unclear how you're intending to install your package (if it's being properly installed at all), so it's hard to say. If the vendor modules are only invoked from your root package, manually messing with `sys.path` when the root package is loaded is an option. You can look up the docs on `.pth` files as well, which the `site` module uses to add arbitrary paths to the module search paths when they're installed in the `site-packages` directory (the standard install location for third-party packages). – ShadowRanger Sep 26 '21 at 01:14
  • 1
    @aronchick: I gave one example of how you could make `root/code/file.py` insert a path relative to its own install location into `sys.path`. However you run `file.py`, it will know its own path, and can use that to find the path to the vendor modules if they're installed in the same place relative to `file.py` that your directory tree diagram indicates. – ShadowRanger Sep 27 '21 at 23:03
  • as an aside, is there a project wide place to just add to sys.path? i'm never sure what order files load and if a file loaded before this one, wouldn't it be missing in sys.path? – aronchick Sep 29 '21 at 15:19
  • @aronchick: Well, the assumption here is that nothing besides you is depending on these vendor-ed modules. So it doesn't matter if it's initially missing from `sys.path`, nothing but `code.file` should be using it. If you need to do it for a complicated package hierarchy, where any of them might use it, and so it must be added for all of them, you typically put a `__init__.py` in the root of the package hierarchy. `__init__.py` is executed when the package (or any of its subpackages/modules) is imported, *before* they're imported, so the `sys.path` manipulation can be done just-in-time. – ShadowRanger Sep 29 '21 at 18:03
  • If you did need something that ran before anything else, that's where `.pth` files come in. They get put in the user or system `site-packages` directory, and the `site` module (isn't guaranteed to run, as it can be turned off manually, but it's *almost* always run) scans those directories for `.pth` files and loads them to extend the set of directories to search (modern `pip` doesn't usually install `.pth` files, it just plunks packages in the root of `site-packages`, but the older `easy_install` tended to make a wrapper dir per package, using a `.pth` to ensure the package could be imported). – ShadowRanger Sep 29 '21 at 18:20
  • I'm not going to get into `.pth` files in depth here; you can read more on [Using .pth files](https://stackoverflow.com/q/15208615/364696), or just search the web for `python .pth` to find more resources than you can shake a stick at. – ShadowRanger Sep 29 '21 at 18:22
  • thank you so much for the explanation! my only thought would be that - today - only a small subset of files (all in the same directory) depend on the vendor'd package. However, i could see a scenario where we add an additional component which also depends on it in the future (not at the system level, just inside the same package) - and then do I have to add append it additionally in the new component? Or do I just use the __init__.py file at the root? I assume the latter. – aronchick Sep 30 '21 at 01:16
  • 1
    @aronchick: If they're all under the same root package, the root's `__init__.py` will handle it. – ShadowRanger Sep 30 '21 at 02:00