26

I am developing several Python projects for several customers at the same time. A simplified version of my project folder structure looks something like this:

/path/
  to/
    projects/
      cust1/
        proj1/
          pack1/
            __init__.py
            mod1.py
        proj2/
          pack2/
            __init__.py
            mod2.py
      cust2/
        proj3/
          pack3/
            __init__.py
            mod3.py

When I for example want to use functionality from proj1, I extend sys.path by /path/to/projects/cust1/proj1 (e.g. by setting PYTHONPATH or adding a .pth file to the site_packages folder or even modifying sys.path directly) and then import the module like this:

>>> from pack1.mod1 import something

As I work on more projects, it happens that different projects have identical package names:

/path/
  to/
    projects/
      cust3/
        proj4/
          pack1/    <-- same package name as in cust1/proj1 above
            __init__.py
            mod4.py

If I now simply extend sys.path by /path/to/projects/cust3/proj4, I still can import from proj1, but not from proj4:

>>> from pack1.mod1 import something
>>> from pack1.mod4 import something_else
ImportError: No module named mod4

I think the reason why the second import fails is that Python only searches the first folder in sys.path where it finds a pack1 package and gives up if it does not find the mod4 module in there. I've asked about this in an earlier question, see import python modules with the same name, but the internal details are still unclear to me.

Anyway, the obvious solution is to add another layer of namespace qualification by turning project directories into super packages: Add __init__.py files to each proj* folder and remove these folders from the lines by which sys.path is extended, e.g.

$ export PYTHONPATH=/path/to/projects/cust1:/path/to/projects/cust3
$ touch /path/to/projects/cust1/proj1/__init__.py
$ touch /path/to/projects/cust3/proj4/__init__.py
$ python
>>> from proj1.pack1.mod1 import something
>>> from proj4.pack1.mod4 import something_else

Now I am running into a situation where different projects for different customers have the same name, e.g.

/path/
  to/
    projects/
      cust3/
        proj1/    <-- same project name as for cust1 above
          __init__.py
          pack4/
            __init__.py
            mod4.py

Trying to import from mod4 does not work anymore for the same reason as before:

>>> from proj1.pack4.mod4 import yet_something_else
ImportError: No module named pack4.mod4

Following the same approach that solved this problem before, I would add yet another package / namespace layer and turn customer folders into super super packages.

However, this clashes with other requirements I have to my project folder structure, e.g.

  • Development / Release structure to maintain several code lines
  • other kinds of source code like e.g. JavaScript, SQL, etc.
  • other files than source files like e.g. documents or data.

A less simplified, more real-world depiction of some project folders looks like this:

/path/
  to/
    projects/
      cust1/
        proj1/
          Development/
            code/
              javascript/
                ...
              python/
                pack1/
                  __init__.py
                  mod1.py
            doc/
              ...
          Release/
            ...
        proj2/
          Development/
            code/
              python/
                pack2/
                  __init__.py
                  mod2.py

I don't see how I can satisfy the requirements the python interpreter has to a folder structure and the ones that I have at the same time. Maybe I could create an extra folder structure with some symbolic links and use that in sys.path, but looking at the effort I'm already making, I have a feeling that there is something fundamentally wrong with my entire approach. On a sidenote, I also have a hard time believing that python really restricts me in my choice of source code folder names as it seems to do in the case depicted.

How can I set up my project folders and sys.path so I can import from all projects in a consistent manner if there are project and packages with identical names ?

Community
  • 1
  • 1
ssc
  • 9,528
  • 10
  • 64
  • 94

3 Answers3

21

This is the solution to my problem, albeit it might not be obvious at first.

In my projects, I have now introduced a convention of one namespace per customer. In every customer folder (cust1, cust2, etc.), there is an __init__.py file with this code:

import pkgutil
__path__ = pkgutil.extend_path(__path__, __name__)

All the other __init__.py files in my packages are empty (mostly because I haven't had the time yet to find out what else to do with them).

As explained here, extend_path makes sure Python is aware there is more than one sub-package within a package, physically located elsewhere and - from what I understand - the interpreter then does not stop searching after it fails to find a module under the first package path it encounters in sys.path, but searches all paths in __path__.

I can now access all code in a consistent manner criss-cross between all projects, e.g.

from cust1.proj1.pack1.mod1 import something
from cust3.proj4.pack1.mod4 import something_else
from cust3.proj1.pack4.mod4 import yet_something_else

On a downside, I had to create an even deeper project folder structure:

/path/
  to/
    projects/
      cust1/
        proj1/
          Development/
            code/
              python/
                cust1/
                  __init__.py   <--- contains code as described above
                  proj1/
                    __init__.py <--- empty
                    pack1/
                    __init__.py <--- empty
                    mod1.py

but that seems very acceptable to me, especially considering how little effort I need to make to maintain this convention. sys.path is extended by /path/to/projects/cust1/proj1/Development/code/python for this project.

On a sidenote, I noticed that of all the __init__.py files for the same customer, the one in the path that appears first in sys.path is executed, no matter from which project I import something.

Community
  • 1
  • 1
ssc
  • 9,528
  • 10
  • 64
  • 94
0

You should be using the excellent virtualenv and virtualenvwrapper tools.

zsquare
  • 9,916
  • 6
  • 53
  • 87
  • 1
    I am using them, but they only provide a solution as long as I import between projects of the same customer. If I need to import from a broader scope, the same conflict arises in a virtual environment in the system python env. Also, as much as I like virtualenv, from my perspective, it patches a major shortcoming of the python interpreter. Am I really the only one who runs in to this issue ? – ssc Jan 20 '12 at 07:22
  • 1
    IMHO, you shouldnt be sharing functionality between projects that way. If they need to share some implementation, consider extracting that as library and including them in both projects – zsquare Jan 20 '12 at 08:51
  • hmmm, so i ask how to do something and the response is to not do it ?!? that's not what i was hoping for. coincidentally, i am actually doing what you suggest - i extract functionality into a library and use that in other projects - which is exactly where this import path problem arises. i'm getting the impression that this is an issue everyone else just works around in some way and i will have to accept this import restriction and e.g. introduce a convention to only import between packages of the same customer (and also maintain one virtual environment per customer). – ssc Jan 22 '12 at 03:44
0

What happens if you accidentally import code from one customer/project in another and don't notice? When you deliver it will almost certainly fail. I would adopt a convention of having PYTHONPATH set up for one project at a time, and not try to have everything you've ever written be importable at once.

You can use a wrapper script per-project to set PYTHONPATH and start python, or use scripts to switch environments when you switch projects.

Of course some projects well have dependencies on other projects (those libraries you mentioned), but if you intend for the customer to be able to import several projects at once then you have to arrange for the names to not clash. You can only have this problem when you have multiple projects on the PYTHONPATH that aren't supposed to be used together.

Ben
  • 68,572
  • 20
  • 126
  • 174
  • How do you 'accidentally import code' ? Import statements are explicit in Python, so there's no implicit importing and with the convention of fully qualified import statements such as cust1.proj1.pack1.mod1, there is not really a danger of accidentally importing a wrong, non-fully qualified package, possibly overwriting a previously existing one. On a sidenote, if anything fails _after_ I've delivered, my test and deployment procedures really need a review... re scripts / environment: That's exactly what virtualenvwrapper does, see other answer. – ssc Jan 25 '12 at 15:36
  • For a single customer, project names are unique. There is however a lot of concept and code re-use taking place between projects for different customers; this has now even led to identical (library) project names for different customers - which is where the problem originated. Functionality is migrated along horizontal code integration lines and the library projects are similar, but not identical. – ssc Jan 25 '12 at 15:37
  • I admit there is not always a need to import criss cross between arbitrary projects, but I am unwilling to accept a technical limitation by the Python interpreter that prevents me from doing so, so I'm glad I found this solution. Along the same lines, I am unwilling to accept "you shouldn't do that anyway" as an answer as that simply seems too dogmatic to me. – ssc Jan 25 '12 at 15:38