The problem
I want to distribute some custom python packages to computing nodes. The installation path is not the same as the build path, so paths can't be hard-coded. The setup process should be no more complicated than source /path/to/MyOrgCode/setup.sh
, and ideally should not require downloading or re-installnig everything. I see several potential methods to do this, all of which seem to have major drawbacks.
Some background to give a concrete example
The high energy physics community uses a tool called CVMFS to distribute compiled executables, libraries, and data files to nodes on computing clusters. In general, you compile your code on a machine with the same architecture as the nodes, then publish to a central repository. So I might build my code in /home/user/MyOrg
, which when published would appear on a compute node at /cvmfs/MyOrg.opensciencegrid.org
. To make this portable, you provide a setup script that might look something like
# MyOrg setup.sh script
export PATH="`dirname $0`/bin:$PATH"
export LD_LIBRARY_PATH="`dirname $0`/lib:$LD_LIBRARY_PATH"
This way everything is always set up relative to wherever the setup script is installed, and so the whole directory tree can be moved around and renamed easily. Somehow I cannot see a simple way to do the same thing with python packages.
Options for distributing python packages
1) Use virtual environments
I can craete a virtual environment alongside my 'bin' and 'lib' directories, copy the whole thing, and then just call source /path/to/MyOrgCode/venv/bin/setup.sh
in my setup script. But:
- virtual environments use hardcoded paths, so I can't move them by default
virtualenv
has the--relocatable
option, but there are lots of warnings about how it can mess things up. Plus, as far as I know,virtualenv
is considered deprecated in favor ofpython -m venv
- Since the venv would generally be installed somewhere without write permissions, users wouldn't be able to install further custom packages for development, since
pip install --user
doesn't function in a venv.
2) Install on demand
Have a pip install -r requirements.txt
line in my setup script. This has a number of potential problems:
- Where to install? A
--user
install will pollute the user's default python setup and may have unexpected consequences. Could use a venv, but then where do we put it? - This is really inefficient if packages have lots of code to compile or large datasets included
- If my python packages are on a private VCS, I need to distribute access credentials with my setup script, which in general will be readable to anyone who knows the URL.
3) Use PYTHONPATH
However many python packages I have, put them all in my repository and then add an entry for each one to PYTHONPATH in my setup script. Potential problems:
- I seem to remember being told that modifying PYTHONPATH is something of a last resort, though right now I can't find any evidence to support that.
- If the packages are located on a can it be compiled (i.e. generate .pyc files?) Is there a command I can run to force compilation before distribution?
Conclusion
Is there a recommended way to do this sort of thing? Is there a different method than the ones I've thought of, or a way around the pitfalls?