1

The problem

I want to distribute some custom python packages to computing nodes. The installation path is not the same as the build path, so paths can't be hard-coded. The setup process should be no more complicated than source /path/to/MyOrgCode/setup.sh, and ideally should not require downloading or re-installnig everything. I see several potential methods to do this, all of which seem to have major drawbacks.

Some background to give a concrete example

The high energy physics community uses a tool called CVMFS to distribute compiled executables, libraries, and data files to nodes on computing clusters. In general, you compile your code on a machine with the same architecture as the nodes, then publish to a central repository. So I might build my code in /home/user/MyOrg, which when published would appear on a compute node at /cvmfs/MyOrg.opensciencegrid.org . To make this portable, you provide a setup script that might look something like

# MyOrg setup.sh script
export PATH="`dirname $0`/bin:$PATH"
export LD_LIBRARY_PATH="`dirname $0`/lib:$LD_LIBRARY_PATH"

This way everything is always set up relative to wherever the setup script is installed, and so the whole directory tree can be moved around and renamed easily. Somehow I cannot see a simple way to do the same thing with python packages.

Options for distributing python packages

1) Use virtual environments

I can craete a virtual environment alongside my 'bin' and 'lib' directories, copy the whole thing, and then just call source /path/to/MyOrgCode/venv/bin/setup.sh in my setup script. But:

  • virtual environments use hardcoded paths, so I can't move them by default
  • virtualenv has the --relocatable option, but there are lots of warnings about how it can mess things up. Plus, as far as I know, virtualenv is considered deprecated in favor of python -m venv
  • Since the venv would generally be installed somewhere without write permissions, users wouldn't be able to install further custom packages for development, since pip install --user doesn't function in a venv.

2) Install on demand

Have a pip install -r requirements.txt line in my setup script. This has a number of potential problems:

  • Where to install? A --user install will pollute the user's default python setup and may have unexpected consequences. Could use a venv, but then where do we put it?
  • This is really inefficient if packages have lots of code to compile or large datasets included
  • If my python packages are on a private VCS, I need to distribute access credentials with my setup script, which in general will be readable to anyone who knows the URL.

3) Use PYTHONPATH

However many python packages I have, put them all in my repository and then add an entry for each one to PYTHONPATH in my setup script. Potential problems:

  • I seem to remember being told that modifying PYTHONPATH is something of a last resort, though right now I can't find any evidence to support that.
  • If the packages are located on a can it be compiled (i.e. generate .pyc files?) Is there a command I can run to force compilation before distribution?

Conclusion

Is there a recommended way to do this sort of thing? Is there a different method than the ones I've thought of, or a way around the pitfalls?

thegreatemu
  • 495
  • 2
  • 11
  • you can use conda for this: https://stackoverflow.com/questions/39280638/how-to-share-conda-environments-across-platforms or also use docker to make a container with the needed packages: https://www.docker.com/ – Mason Caiby Sep 25 '19 at 18:57
  • @MasonCaiby docker is a no-go as it is not widely enough supported on computing nodes (you generally have to take extra steps to make sure you get a c++11 compatible compiler!) conda is a possibility, though heavier weight than I'd like – thegreatemu Sep 25 '19 at 19:02

0 Answers0