1

I have read plenty of documentation (including this and linked references) on this, but it is simply too difficult for my simple mind: I cannot understand the logics of python import and I usually waste plenty of time in random attempts till I reach a working permutation of settings and commands. May be this is due to the fact that I usually use PyCharm, where everything magically work. Now I am using Visual Studio Code on a remote machine and I need to ask here since I have wasted double the time I usually spend on this without reaching a permutation that works.

Using python 3 on linux (remote machine). The python interpreter is configured with a virtual environment and it does not correspond to the system level one.

I have this project. Its folder structure is mirrored in linux filesystem, i.e., prj, src, commonn, etc. are all folders.

prj
 |- src
 |  some py files
 |  |- common/
 |     - common1.py
 |     - common2.py
 |  |- pipelines/
 |     - main_pipeline1.py  (<- file prefixed with main_ have a __main__ entry point)
 |     - main_pipeline2.py
 |  | - other py module 
 |  | - other py module, ... and others - some of these modules use common
 |- data/ ...
 |- doc/ ...

In pipeline1.py, I have: import common.common1. I corrected this

In what follows $[folder] corresponds to the bash prompt, so $ stands for normal user and folder is the current folder.

When I run pipeline1.py as normal user (on the remote machine), first I get an error:

$[prj/src] python pipeline/pipeline1.py
ModuleNotFoundError: No module named 'common'

In order to have it working I need to add the current folder to PYTHONPATH (that is empty). So

$[prj/src] PYTHONPATH=.
$[prj/src] python pipeline/pipeline1.py

works.

However, the previous script writes in a disk that requires root access, so the previous command needs to be run with sudo. I cannot find a way to run it using sudo:

I tried (after reading, among others, this):

$[prj/src] sudo python pipeline/pipeline1.py
$[prj/src] sudo /path/to/env/bin/python pipeline/pipeline1.py
$[prj/src] sudo -E /path/to/env/bin/python pipeline/pipeline1.py

they all fail, all but the first because python cannot find the module common. Even if I asked to keep the environment with -E (so PYTHONPATH should be kept) the import fails. All the other imports from the virtual environment (that occur before the import common) do not fail.

In the future I need to give the code to a sys admin that might possibly not have any specific knowledge of python: I cannot ask him to set PYTHONPATH check this, check that.

In this case, how should I organize my code to have the import common (or any other module I write) succeed? Do I really need to add PYTHONPATH=. every time?

Is there any kind soul willing to help me? Beer after the pandemic is over.

I made a correction:

import common.common1.py --> common.common1
Antonio Sesto
  • 2,868
  • 5
  • 33
  • 51

3 Answers3

0

Add this to the start of pipeline1.py:

import sys
import os
sys.path.append(os.path.realpath(os.path.dirname(__file__), "/.."))
import common.common1
KetZoomer
  • 2,701
  • 3
  • 15
  • 43
  • 1
    The problem with that is that it depends on the current directory, which is not consistent. Better, in my opinion, is to append `os.path.realpath( os.path.dirname( __file__ ) + "/..")`. – Tim Roberts May 02 '21 at 06:15
  • I've used this (or close to the same) a few times and eventually found ways to avoid it. flake8 complains about the sys.path.append appearing above another import. – jwal May 02 '21 at 07:01
  • `os.path.dirname(__file__)` corresponds to `src/pipelines`. Executing your command would mean to add `src` to the path: the folder I am running python from. Shouldn't `src` already be in the path if python is run from `src`? Please notice I am not running `python main_pipeline1.py` from `src/pipelines` but `python pipelines\main_pipeline1.py` from the `src` folder. – Antonio Sesto May 02 '21 at 09:54
  • @AntonioSesto, I've found that python modules are weird, and to not depend on where they are run from (unless your really smart). Does it work though? – KetZoomer May 02 '21 at 15:41
  • It worked with @TimRoberts's command, I think there's a typo in yours since you are joining an "absolute" subfolder ("/.."). – Antonio Sesto May 02 '21 at 17:09
0

I'm assuming Linux and also that the Python software to be distributed has the setup.py.

Short answer: no, you don't have to modify the PYTHONPATH or sys.path


  • Create a virtual env (say /opt/myprog) as usual.

  • Activate it and install your package (say mypkg) and all its dependencies.

  • Put all executable scripts to the bin subdirectory of the virtual env and make sure they start with #!/opt/myprog/bin/python3 shebang line. With correct setup.py this will hapen automatically during installation, see scripts. The scripts will be able to normally import the installed package import mypkg or its parts from mypkg import ...

  • Finally symlink the scripts to a directory in users' PATH e.g. to /usr/local/bin. This must be done manually and only once unless you add or rename a script.

Projects installed this way can be normally upgraded (with pip inside an activated environemnt) and the scripts can be normally invoked from the command line.

VPfB
  • 14,927
  • 6
  • 41
  • 75
  • You do not need to assume I am using Linux because I wrote it in the question :-) Thanks, however, your solution is may correct, but too complex for what I am trying to obtain. – Antonio Sesto May 02 '21 at 09:28
  • @AntonioSesto You are right about the Linux, sorry. If you find building packages for distribution too complex now, that's OK, but put it on you to-do list to learn it later. For example `git` is much more complex, but for any serious work you need to learn at least the basic level. – VPfB May 02 '21 at 09:39
  • I normally use git, even its advanced features, and I find it incredibly easier than, e.g., understanding python imports :-) In this case I do not understand why the imports fail when I run the pipelines from the src folder since they are all relative to that folder. – Antonio Sesto May 02 '21 at 09:41
  • @AntonioSesto I do understand :-). It happened to me several times that I thought I understand the import (finally!), only to find out that it has even more complexity. – VPfB May 02 '21 at 09:45
0

Based on the structure and your comments, my guess is that you are not trying to pip-install this project as a proper Python package. Rather, it is just a directory with some scripts and modules you want to use. If so, you have at least a couple of options.

First, don't muck around with PYTHONPATH or modifying sys.path. That is almost always a worse approach.

The basic rule for Python importing: the root directory for the purpose of finding packages is the directory of the script/file used to invoke Python. (Ignoring built-ins and packages that have been formally installed, of course.)

Maybe the easiest solution is to move common under pipelines (and optionally make it a package by creating __init__.py inside of it). If you follow the logic of the basic rule, you'll understand why this works (and it won't be affected by full-path vs relative-path issues when invoking python).

src/
    pipelines/
        common/
            __init__.py       # Optional for Python 3.3+
            common1.py
            common2.py
        main_pipeline1.py
        main_pipeline2.py

Another approach is to create simple runner script at the top level. The runner imports the pipelines, selects the right one (based command-line argument or some other configuration), and executes its top-level code (eg, its main()). If the pipelines are not well organized for that type of importing and execution, this approach is quite a bit harder.

src/
    common/
        __init__.py           # Optional
        common1.py
        common2.py
    pipelines/
        __init__.py           # Optional
        main_pipeline1.py
        main_pipeline2.py
    runner.py

Separate issue: import modules, not files.

import common.common1.py   # No
import common.common1      # Yes
FMc
  • 41,963
  • 13
  • 79
  • 132
  • You got the point. This is exactly how the code is organized. There are several runner scripts joining the various "pipelines" in more complex processing. Indeed, each `main_pipeline.py` has a `main()` that is called by the runner scripts or, like in this case, from the usuale `if __name__ == "__main__"` inside of the specific `main_pipeline` when one wants to test a specific pipeline. You said that the root directory corresponds to the folder python is invoked from, but then why doesn't it work when I invoke python in src? – Antonio Sesto May 02 '21 at 09:20
  • I made a correction because I was not importing `common.common1.py` but `common.common1`. – Antonio Sesto May 02 '21 at 09:26
  • I cannot move common inside of pipelines because there are other modules using common outside of pipelines. The folder structure is more crowded than in the question. – Antonio Sesto May 02 '21 at 09:30
  • Are `__init__.py` files needed? e.g. https://stackoverflow.com/questions/37139786/is-init-py-not-required-for-packages-in-python-3-3 – Antonio Sesto May 02 '21 at 09:44
  • @AntonioSesto I guess not. That change slipped by my radar. – FMc May 02 '21 at 16:18