30

I'm writing a family of Python scripts within a project; each script is within a subdirectory of the project, like so:

projectroot
  |
  |- subproject1
  |    |
  |    |- script1.main.py
  |    `- script1.merger.py
  |
  |- subproject2
  |    |
  |    |- script2.main.py
  |    |- script2.matcher.py
  |    `- script2.merger.py
  |
  `- subproject3
       |
       |- script3.main.py
       |- script3.converter.py
       |- script3.matcher.py
       `- script3.merger.py

Now several of the scripts share some code. The shared code is best considered part of the project itself, and not something I would compile separately and make a library out of, or drop in a sitewide PYTHONPATH. I could place that code in various places, such as in the projectroot directory itself, or in a child directory of projectroot called common (perhaps).

However, most of the ways I have thought of so far involve making packages out of my subprojects with empty __init__.py files and using relative imports (or redundantly messing with sys.path in every subproject. Worse, it seems like building a package structure around this family of scripts runs afoul of the following warning from the rejected PEP-3122:

Attention! This PEP has been rejected. Guido views running scripts within a package as an anti-pattern.

If scripts within a package is anti-patternish, how can I set things up in a way which keeps the common code in the same project? Or is a module and package-based system acceptable here? Which is the cleanest approach? (FWIW I would prefer to have a file such as shared.py or common.py in the project root directory, rather than making a utility directory that is a sibling to the "real" subprojects.)

Ray Toal
  • 86,166
  • 18
  • 182
  • 232
  • 1
    i believe django uses a centralized entry point `manage.py` to run all of its scripts. Doing something like this this could allow you to turn your `subprojectX`'s into packages, and handle importing centralized inside the "`manage.py`" (entry point) script. As packages, I believe, it will easily support a `common` module where your shared functionality could live. – dm03514 Aug 06 '13 at 18:03
  • 1
    I believe that should be [PEP-3122](https://www.python.org/dev/peps/pep-3122/), not PEP-32122. – user1071847 Feb 12 '18 at 17:51

3 Answers3

31

I suggest putting trivial "launcher" scripts at the top level of your project, and making each of the subproject folders into packages. The modules in the packages can import each other or common code can be factored out into a common package.

Here's what the structure would look like, if we assume the various merger modules can be refactored into a shared version:

projectroot
  |- script1.py # launcher scripts, see below for example code
  |- script2.py
  |- script3.py
  |
  |- common
  |    |- __init__.py
  |    |- merger.py # from other packages, use from ..common import merger to get this
  |
  |- subproject1
  |    |- __init__.py # this can be empty
  |    |- script1_main.py
  |
  |- subproject2
  |    |- __init__.py
  |    |- script2_main.py
  |    |- script2_matcher.py
  |
  |- subproject3
       |- __init__.py
       |- script3_main.py
       |- script3_converter.py
       |- script3_matcher.py

The launcher scripts can be very simple:

from subproject1 import script1_main

if __name__ == "__main__":
    script1_main.main()

That is, all it does is import the appropriate "scriptN_main" module and run a function in it. Using a simple script may also have some small benefits for script startup speed, since the main module can have its compiled bytecode cached to a .pyc file, while scripts are never cached.

Note: I renamed your modules, swapping _ characters in for the . characters. You can't have a . in an identifier (such as a module name), since Python expects it to indicate attribute access. That meant those modules could never be imported. (I'm guessing that this is an artifact of the example files only, not something that you have in your real code.)

Blckknght
  • 100,903
  • 11
  • 120
  • 169
  • Looks good, but what can I say inside of `subproject1.script1_main.py` to access `common`? I tried `import common` but received `File "movie/main.py", line 1, in import common ImportError: No module named common` I don't want to manually set `sys.path`. Am I missing something? – Ray Toal Aug 14 '13 at 03:10
  • I think that `import ..common` should work (an explicit relative import). Make sure you're running the scripts at the top level rather than running the sub-project files directly, or it might not realize it's in a package (you'll get an error about the `..` part of the import in that case). – Blckknght Aug 14 '13 at 04:32
  • 1
    1. The "common" directory needs a `__init__.py` file inside. 2. The command `import ..common`, whereas `from .. import common` is correct. It requires that projectroot also contains `__init__.py` and is also imported like the parent package. 3. If you don't import projectroot but you run a script in it, then you can easy `import common`, because the '.' directory of the script is automatically added to python path on startup. – hynekcer Aug 15 '13 at 22:46
  • 15
    Ugh. I can't see a *better* way of doing this, so +1, but nonetheless, this is horrible. If I have a large suite of command-line scripts that all share code, I damn well want to be able to organise them with a sensible directory structure. That Python forces me to do things this way (or resort to using `PYTHONPATH`) is kind of upsetting. – Mark Amery Jan 13 '14 at 18:00
  • adding docker, build scripts, and docs make this structure ugly – Andrew Matiuk May 11 '21 at 05:45
1

My preference would be a separate "bin" or "scripts" directory, with subprojects as libraries / packages:

projectroot
  |
  |- scripts
  |
  |- lib
  |    |
  |    `- matcher.py
  |    `- merger.py
  |    `- subproject1
  |    `- subproject2
  |    `- subproject3

The idea being your scripts can each reference any subprojects necessary as usual packages. And your subprojects can also reference each other with imports.

You can then also have a main or shared script that sets up the subproject packages for you, if that helps.

Matt S
  • 14,976
  • 6
  • 57
  • 76
  • I do like that division but I've edited my answer to show the reason for putting each script in a separate subproject to begin with -- it's because each "script" has multiple parts. I want to have some shared library code -- functions (and classes) usable by each of the scripts among the different subprojects. – Ray Toal Aug 06 '13 at 18:29
  • I think the shared code should go in the root of libs. Please see my edit. – Matt S Aug 06 '13 at 18:38
  • 1
    Matt - How would import anything from `lib` in the scripts under `scripts`? You would not be able to use relative imports, since they are not modules technically. – Amelio Vazquez-Reina Aug 19 '13 at 21:25
0

I recently discovered this technique, which seems to work on Python 3.9. It's not far different from Blckknght's answer, but it avoids the need for run scripts for each subproject in projectroot itself.

projectroot
  |
  |- common
  |    |
  |    `- merger.py
  |
  |- subproject1
  |    |
  |    `- __main__.py
  |
  |- subproject2
  |    |
  |    |- __main__.py
  |    `- matcher.py

From the projectroot directory, run with

python -m subproject1
python -m subproject2

Effectively you are treating subproject1 and subproject2 as 'application bundles'.

Both subproject1 and subproject2 seem to be able to import common.merger directly without any special measures such as hacking the import path.

There's one glitch, which may or may not be important. Within each subproject the import root directory is projectroot so you have to use absolute imports or explicit relative imports within the project itself.

import .matcher

or

import subproject2.matcher

but not

import matcher # ModuleNotFoundError: No module named 'matcher'

The other downside is that it requres the, perhaps, non-obvious -m flag to run the applications.

Ian Goldby
  • 5,609
  • 1
  • 45
  • 81