7

In PEP 366 - Main module explicit relative imports which introduced the module-scope variable __package__ to allow explicit relative imports in submodules, there is the following excerpt:

When the main module is specified by its filename, then the __package__ attribute will be set to None. To allow relative imports when the module is executed directly, boilerplate similar to the following would be needed before the first relative import statement:

if __name__ == "__main__" and __package__ is None:
    __package__ = "expected.package.name"

Note that this boilerplate is sufficient only if the top level package is already accessible via sys.path. Additional code that manipulates sys.path would be needed in order for direct execution to work without the top level package already being importable.

This approach also has the same disadvantage as the use of absolute imports of sibling modules - if the script is moved to a different package or subpackage, the boilerplate will need to be updated manually. It has the advantage that this change need only be made once per file, regardless of the number of relative imports.

I have tried to use this boilerplate in the following setting:

  • Directory layout:

    foo
    ├── bar.py
    └── baz.py
    
  • Contents of the bar.py submodule:

    if __name__ == "__main__" and __package__ is None:
        __package__ = "foo"
    
    from . import baz
    

The boilerplate works when executing the submodule bar.py from the file system (the PYTHONPATH modification makes the package foo/ accessible on sys.path):

PYTHONPATH=$(pwd) python3 foo/bar.py

The boilerplate also works when executing the submodule bar.py from the module namespace:

python3 -m foo.bar

However the following alternative boilerplate works just as well in both cases as the contents of the bar.py submodule:

if __package__:
    from . import baz
else:
    import baz

Furthermore this alternative boilerplate is simpler and does not require any update of the submodule bar.py when it is moved with the submodule baz.py to a different package (since it does not hard code the package name "foo").

So here are my questions about the boilerplate of PEP 366:

  1. Is the first subexpression __name__ == "__main__" necessary or is it already implied by the second subexpression __package__ is None?
  2. Shouldn’t the second subexpression __package__ is None be not __package__ instead, in order to handle the case where __package__ is the empty string (like in a __main__.py submodule executed from the file system by supplying the containing directory: PYTHONPATH=$(pwd) python3 foo/)?
Géry Ogam
  • 6,336
  • 4
  • 38
  • 67
  • 1
    For your first question: https://stackoverflow.com/questions/419163/what-does-if-name-main-do – Brian McCutchon Sep 15 '20 at 22:26
  • @BrianMcCutchon I know what the first subexpression does, but my question was rather is the first subexpression necessary? Isn’t it implied by the second subexpression? – Géry Ogam Sep 16 '20 at 09:44
  • 3
    Note that ``from . import baz`` is not equivalent to ``import baz``. It imports ``baz`` with a different name, which may lead to ``baz``existing *twice* if any other code uses the other import form. – MisterMiyagi Sep 16 '20 at 13:57
  • @MisterMiyagi Thanks for your feedback. But since it is a boilerplate, by definition the boilerplate will be present in every local module that imports a local module, so this will never happen. – Géry Ogam Sep 16 '20 at 21:25
  • 3
    You would have to guard literally every import of `foo.baz` and of `baz`, not just inside `foo`. You would also have to forbid the use of a any third-party `baz` module to avoid name clashes. Just something to keep in mind before someone uses this "simpler" alternative. – MisterMiyagi Sep 17 '20 at 06:14
  • @MisterMiyagi Interesting point, I hadn’t consider the possibility of another module using the *absolute* import `foo.baz`, which would work even if `__package__` is `None` so even with the absolute import `baz` used in the above submodule `bar`, leading to the loading of two versions of `baz` as you explained: `sys.modules['baz']` and `sys.modules['foo.baz']`. – Géry Ogam Sep 17 '20 at 10:39
  • So it seems that one should always use absolute imports *from the root package* (`from foo import baz`, not just `import baz`) or relative imports (`from . import baz`) with the `__package__` manipulation given in PEP 366 boilerplate for supporting `python ` invocations. Since the whole point of relative imports is to avoid hard coding the parent packages in submodules so that the submodules can be moved without requiring an update, the boilerplate is useless and either absolute imports from the root package should always be used or `python ` invocations should not be supported. – Géry Ogam Sep 17 '20 at 10:40
  • @MisterMiyagi Do you arrive at the same conclusion? Otherwise what is your recommandation? – Géry Ogam Sep 17 '20 at 10:47
  • My recommendation is not to mix executable scripts (``python foo/bar.py``) and executable modules (``python -m foo.bar``). While in simple cases both behave the same, and it is possible to fake *some* of the differences, both are fundamentally different things. – MisterMiyagi Sep 18 '20 at 09:17
  • @MisterMiyagi *Libraries* are only imported (`import `), never executed as the `__main__` module (`python ` or `python -m `), so they can use relative imports (without the PEP 366 boilerplate) with their benefits (submodules can be moved without being updated). Only *applications* can be executed as the `__main__` module and therefore are problematic; they should either use absolute imports with their drawbacks (submodules cannot be moved without being updated) or relative imports (without the PEP 366 boilerplate) with their benefits *but without `python ` support*. – Géry Ogam Sep 18 '20 at 12:10
  • @Maggyero As with the last "discussion" we had, that seems to be entirely orthogonal to what I said. These comment lectures are frankly getting very exhausting. Feel free to use chat, but I have no more interest in these derailing comment threads. – MisterMiyagi Sep 18 '20 at 12:16
  • @MisterMiyagi Yes it is orthogonal (I did not disagree with your last comment), but like your comments which were orthogonal to the two questions in this post which are still unanswered. And there is nothing wrong with orthogonal information, your comments triggered an interesting discussion about absolute imports vs relative imports. I am trying to find a guideline of when one should be used instead of the other, which is a very practical issue. – Géry Ogam Sep 18 '20 at 13:56
  • @MisterMiyagi I thought the choice between absolute imports vs relative imports was directed by application module vs library module. But now I realize that it is directed by *entry-point module* vs non entry-point module. Entry-point modules (like `foo.__main__`) should always use absolute imports (they could also use relative imports with the PEP 366 boilerplate but it is longer, less readable and does not buy you anything) while non entry-point modules (like `foo.bar`) should always use relative imports (they could also use absolute imports but you lose the benefits of move-without-update). – Géry Ogam Sep 18 '20 at 15:20
  • @MisterMiyagi I think your comments had actually answered the question. Perhaps Maggyero's confusion is around when the boilerplate should be used? (The answer is "Never, unless you *really* want to allow a submodule to be executed directly as a script, but if you think you want that, you should really reconsider". The PEP doesn't say that, because it's a technical design doc, not end user documentation) – ncoghlan Sep 22 '20 at 08:05
  • 1
    @ncoghlan The question explicitly consists of two questions (is the main guard necessary? is the None guard correct?), which my comments do not address at all. They are not adequate answers as-is. "Don't do that" is an adequate answer to a question which has not been asked here. – MisterMiyagi Sep 22 '20 at 08:39

1 Answers1

5

The correct boilerplate is none, just write the explicit relative import and let the exception escape if someone tries to run the module as a script or has sys.path misconfigured:

from . import baz

The boilerplate given in PEP 366 is just there to show that the proposed change is sufficient to allow users to make direct execution* work if they really want to, it isn’t intended to suggest that making direct execution work is a good idea (it isn’t, it is a bad idea that will almost inevitably cause other problems, even with the boilerplate from the PEP).

Your proposed alternative boilerplate recreates the problem caused by implicit relative imports in Python 2: the "baz" module gets imported as baz from __main__, but will be imported as "foo.baz" everywhere else, so you end up with two copies in sys.modules under different names.

Amongst other problems, this means that if some other module throws foo.baz.SomeException and your __main__ module tries to catch baz.SomeException, it won’t work, as those will be two different exception objects coming from two different modules.

By contrast, if you use the PEP boilerplate, then __main__ will correctly import baz as "foo.baz", and the only thing you have to worry about is other modules potentially importing foo.bar.

If you want simpler boilerplate that explicitly guards against the "inadvertently making two copies of the same module under a different name" bug without hardcoding the package name, then you can use this:

if not __package__:
    raise RuntimeError(f"{__file__} must be imported as a package submodule")

However, if you are going to do that, you can just as well do from . import baz unconditionally as suggested above, and let the underlying exception escape if someone tries to run the script directly instead of via the -m switch.


* Direct execution means executing code from:

  1. A file path argument except directory and zip file paths (python <file path>).
  2. A -c argument (python -c <code>).
  3. The interactive interpreter (python).
  4. Standard input (python < <file path>).

Indirect execution means executing code from:

  1. A directory or zip file path argument (python <directory or zip file path>).
  2. A -m argument (python -m <module name>).
  3. An import statement (import <module name>)

Now to answer your questions specifically:

  1. Is the first subexpression __name__ == "__main__" necessary or is it already implied by the second subexpression __package__ is None?

It is hard to get __package__ is None anywhere other than the __main__ module with the modern import system. But it used to be a lot more common, as rather than being set by the import system on module load, __package__ would instead be set lazily by the first explicit relative import executed in the module. In other words, the boilerplate is only trying to let direct execution work (cases 1 to 4 above) but __package__ is None used to imply direct execution or an import statement (case 7 above), so to filter out case 7 the subexpression __name__ == "__main__" (cases 1 to 6 above) was necessary.

  1. Shouldn’t the second subexpression __package__ is None be not __package__ instead, in order to handle the case where __package__ is the empty string (like in a __main__.py submodule executed from the file system by supplying the containing directory: PYTHONPATH=$(pwd) python3 foo/)?

No because the boilerplate is only trying to let direct execution work (cases 1 to 4 above), it isn’t trying to let other flavours of sys.path misconfiguration pass silently.

Géry Ogam
  • 6,336
  • 4
  • 38
  • 67
ncoghlan
  • 40,168
  • 10
  • 71
  • 80
  • Thanks for the explanation and confirmation that your PEP 366 boilerplate was not a recommendation, just a proof of concept. Now my questions were specifically about improving your boilerplate (though it should not be used): shouldn’t `if __name__ == "__main__" and __package__ is None: __package__ = "expected.package.name"` be `if not __package__: __package__ = "expected.package.name"` instead? Since it seems that: 1. `__name__ == "__main__"` is already implied by `__package__ is None`. 2. `__package__ is None` does not apply to \_\_main__.py submodules as for them `__package__ == ""`. – Géry Ogam Sep 22 '20 at 09:58
  • If you address this in your answer I will gladly accept it. – Géry Ogam Sep 22 '20 at 10:00
  • 1
    No, as the boilerplate is only trying to let direct execution work, it isn't trying to let other flavours of sys.path misconfiguration pass silently. – ncoghlan Sep 24 '20 at 04:16
  • 1
    Breaking down the guard: the `__name__` check is asking "Is this the main module or an import? If it's an import, don't do anything.", while the `__package__` check is asking "Was this run directly or via the `-m` switch? If run via `-m`, don't do anything". Thus, the snippet only runs for direct execution, and is skipped otherwise. – ncoghlan Sep 24 '20 at 04:20
  • If I understood correctly, you don’t consider `PYTHONPATH=$(pwd) python3 foo/` (which executes foo/__main__.py) as *direct execution*, though it does not use the `-m` switch? Remembering your last comment [here](https://stackoverflow.com/a/37339817/2326961) which stated that CPython delegates to `runpy._run_module_as_main` when passing a directory or zip file path command-line argument, *like when using the `-m` switch*, that makes sense not to consider it as direct execution. So this answers my question 2. – Géry Ogam Sep 24 '20 at 09:37
  • I have checked that `__package__ is None` with: 1. A file path argument except directories and zip files (`python `). 3. The interactive interpreter (`python`). 4. Standard input (`python < `). But not with: 5. A directory path or zip file path argument (`python `). 7. An import statement (`import `). – Géry Ogam Sep 24 '20 at 10:32
  • I have also checked that `__name__ == "__main__ "` in all cases except case 7, so it seems redundant with `__package__ is None` (cases 1, 2, 3 and 4). So I guess you used `__name__ == "__main__ "` to document the intent ("Is this the main module or an import?") and avoid depending on the implications of `__package__ is None` which is used primarily for another purpose ("Was this run directly or via the `-m` switch?"). But `__name__ == "__main__ "` is not strictly necessary. If it is so, you also answered my question 1. Thanks a lot Nick! – Géry Ogam Sep 24 '20 at 10:44
  • 1
    Yeah, it's hard to get `__package__ is None` anywhere other than the main module with the modern import system. It used to be a lot more common, as rather than being set by the import system on module load, the attribute would instead be set lazily by the first explicit relative import executed in the module. – ncoghlan Oct 12 '20 at 02:48
  • Very interesting, I was not aware of this. So basically, you wanted your boilerplate to target only cases 1 to 4. But `__package__ is None`, besides implying a direct execution (cases 1 to 4), *used to* also imply an import statement (case 7). So to narrow down to only cases 1 to 4, you had to remove case 7, which is exactly what `__name__ == "__main__"` does since the latter implies cases 1 to 6. Now I fully understand your boilerplate! I have extended your answer by adding the answers to the 2 questions of my post that you provided here in comments, so that I could accept it. – Géry Ogam Oct 12 '20 at 09:50