10

Users should install our python package via pip or it can be cloned from a github repo and installed from source. Users should not be running import Foo from within the source tree directory for a number of reasons, e.g. C extensions are missing (numpy has the same issue: read here). So, we want to check if the user is running import Foo from within the source tree, but how to do this cleanly, efficiently, and robustly with support for Python 3 and 2?

Edit: Note the source tree here is defined as where the code is downloaded too (e.g. via git or from the source archive) and it contrasts with the installation directory where the code is installed too.

We considered the following:

  • Check for setup.py, or other file like PKG-INFO, which should only be present in the source. It’s not that elegant and checking for the presence of a file is not very cheap, given this check will happen every time someone import Foo. Also there is nothing to stop someone from putting a setup.py outside to the source tree in their lib/python3.X/site-packages/ directory or similar.
  • Parsing the contents of setup.py for the package name, but it also adds overhead and is not that clean to parse.
  • Create a dummy flag file that is only present in the source tree.
  • Some clever, but likely overcomplicated and error-prone, ideas like modifying Foo/__init__.py during installation to note that we are now outside of the source tree.
Chris_Rands
  • 38,994
  • 14
  • 83
  • 119
  • So what are you looking to show the user. If the user runs import Foo from the source tree, does he see some kind of exception or warning ? @Chris_Rands – Devesh Kumar Singh May 04 '19 at 11:05
  • @DeveshKumarSingh Yes exactly, either an exception like `numpy` raises, or more likely a custom warning message like `if in_source_tree: warnings.warn(msg, CustomWarning)` – Chris_Rands May 04 '19 at 11:41
  • I have a package structure with me where `import Foo; Foo.__file__` will show different paths based on where it was installed, and that can be compared to the source tree path by doing `os.getcwd()`, will such an approach work for you @Chris_Rands ! – Devesh Kumar Singh May 04 '19 at 11:46
  • @DeveshKumarSingh Note quite- if the user is in the installation directory e.g. `lib/python3.X/site-packages/`, then `Foo.__file__` will match `os.getcwd()`; it will also match if the user is actually in the source tree. Note the source tree here is the directory where the code was downloaded too (normally via git clone) vs the installation directory where the code is installed to – Chris_Rands May 04 '19 at 12:10
  • Aah, To address this I can always do something like `git rev-parse --git-dir` which will return `.git` if it is a git repo, otherwise it will throw an error! And this command only works in the source dir cloned via git, not anywhere else! Will that solve the issue @Chris_Rands – Devesh Kumar Singh May 04 '19 at 12:13
  • @DeveshKumarSingh I am not sure this is better than the options i mentioned in the bullet points such as checking for `setup.py`, but write an answer if you think you can convince me! – Chris_Rands May 04 '19 at 12:17
  • Alright, you did say `Check Foo/__init__.py for the __file__ variable. This displays a relative path when imported from source and an absolute path otherwise, but only on Python 2.` But this works for Python3 too! So how will the answer work? A function where I pass the module name and it returns an exception based on if the function is run from source tree or outside it – Devesh Kumar Singh May 04 '19 at 12:18
  • Alright, you did say `Check Foo/__init__.py for the __file__ variable. This displays a relative path when imported from source and an absolute path otherwise, but only on Python 2`. But this works for Python3 too! So how will the answer work? A function where I pass the module name and it returns an exception based on if the function is run from source tree or outside it @Chris_Rands – Devesh Kumar Singh May 04 '19 at 12:39
  • The only reliable solution nowadays is the [`src` layout](https://blog.ionelmc.ro/2014/05/25/python-packaging/#the-structure), so you can't import anything from the source tree and are forced to install the code one way or another (think editable installs etc). This way, you don't need to write any checks (they wouldn't be complete or non-contradictory anyway, so why bother). – hoefling May 05 '19 at 00:04
  • I must be missing something, but if your C extensions are missing in the bad use case, just `import` them and fail (perhaps with a helpful message)? – Davis Herring May 06 '19 at 05:03
  • @DavisHerring this is what we do now, but C extensions can also fail to be imported for other reasons – Chris_Rands May 06 '19 at 05:55
  • @hoefling Changing the directory structure here is not an option unfortunately as the package is mature – Chris_Rands May 06 '19 at 09:12

1 Answers1

6

Since you mention numpy in your comments and wanting to do it like they do but not fully understanding it, I figured I would break that down and see if you could implement a similar process.


__init__.py

The error you are seeking starts here which is what you linked to in your comments and answers so you already know that. It's just attempting an import of __config__.py and failing if it isn't there or can't be imported.

    try:
        from numpy.__config__ import show as show_config
    except ImportError:
        msg = """Error importing numpy: you should not try to import numpy from
        its source directory; please exit the numpy source tree, and relaunch
        your python interpreter from there."""
        raise ImportError(msg)

So where does the __config__.py file come from then and how does that help? Let's follow below...

setup.py

When the package is installed, setup is called to run and it in turn does some configuration actions. This is essentially what ensures that the package is properly installed rather than being run from the download directory (which I think is what you are wanting to ensure).

The key here is this line:

config.make_config_py() # installs __config__.py

misc_util.py

That is imported from distutils/misc_util.py which we can follow all the way down to here.

    def make_config_py(self,name='__config__'):
        """Generate package __config__.py file containing system_info
        information used during building the package.
        This file is installed to the
        package installation directory.
        """
        self.py_modules.append((self.name, name, generate_config_py))

Which is then running here which is writing in that __config__.py file with some system information and your show() function.


Summary
The import of __config__.py is attempted and fails which generates the error you are wanting to raise if setup.py wasn't run, which is what triggers that file to be properly created. That ensures not only that a file check is being done but that the file only exists in the installation directory. It is still some overhead of importing an additional file on every import but no matter what you do you're adding some amount of overhead making this check in the first place.


Suggestions

I think that you could implement a much lighter weight version of what numpy is doing while accomplishing the same thing.

Remove the distutils subfunction and create the checked file within your setup.py file as part of the standard install. It would only exist in the installed directory after installation and never elsewhere unless a user faked that (in which case they could get around just about anything you try probably).

As an alternative (without knowing your application and what your setup file is doing) possibly you have a function that is normally imported anyway that isn't key to the running of the application but is good to have available (in numpy's case the functions are information about the installation like version(). Instead of keeping those functions where you put them now, you make them part of this file that is created. Then you are at least loading something that you would otherwise load anyway from somewhere else.

Using this method you are importing something no matter what, which has some overhead, or raising the error. I think as far as methods to raise an error because they aren't working out of the installed directory, it's a pretty clean and straightforward way to do it. No matter what method you use, you have some overhead of using that method so I would focus on keeping the overhead low, simple, and not going to cause errors.

I wouldn't do something that is complicated like parsing the setup file or modifying necessary files like __init__.py somewhere. I think you are right that those methods would be more error prone.

Checking if setup.py exists could work but I would consider it less clean than attempting to import which is already optimized as a standard Python function. They accomplish similar things but I think implemented the numpy style is going to be more straight forward.

MyNameIsCaleb
  • 4,409
  • 1
  • 13
  • 31
  • Thanks- i did already check the source- i guess my issue is not so much that i don't understand but i'm not sure if this solution to too complicated to implement in our own package. Our `setup.py` file is completely different in structure from what `numpy` use and they have their own `distutils` modules also. Perhaps you have a view on the pros/cons of this solution compared to the ones I bullet pointed in the question? If we do go the `numpy` route (which now seems unlikely) we'd want it to be lightweight, self-contained and not intrusive to the current code structure – Chris_Rands May 07 '19 at 17:30
  • I added a bunch at the bottom because I ran out of comment space. Basically I think the `numpy` route but simplified is going to be the most easy to understand and implement method rather than some of the other options. I would prioritize lightweight and understandable. – MyNameIsCaleb May 07 '19 at 17:57