0

On AWS Elastic Beanstalk, I have some Python packages installed in a directory that is not part of the standard Python packages path (either for 2.6.x or for the 2.7.x version used by the Elastic Beanstalk environment). As a result, these packages are not (by default) visible to the AWS-EB deployment processes when it installs packages listed in requirements.txt, which can result in redundant packages being installed, often at the cost of (very) long deployment times.

Is there a way to make the directory where my packages are installed visible to the deployment process?


Conceptually, since (I assume) requirements.txt processing occurs in my application's virtual environment (does it?) I could

echo 'export PYTHONPATH="/anaconda/lib/python2.7/site-packages"' >> /opt/python/run/venv/bin/activate

at some stage before requirements.txt is processed and (for tidiness)

sed -i '/^export PYTHONPATH/d' /opt/python/run/venv/bin/activate

when it deactivates. But it isn't clear to me that this would happen at the right point in deployment. And anyway, it doesn't work because of permissions issues (I'm denied when I eb ssh and as a container_commands these have no effect). Perhaps something like this though is on the right path; are there places I could "hook" similar commands? (In any case it illustrates roughly what I'm trying to do.)

Community
  • 1
  • 1
orome
  • 45,163
  • 57
  • 202
  • 418
  • Why dont you just add the path to your python path in your code? – Nick Humrich Jan 24 '15 at 23:22
  • My code doesn't do the deployment. My goal is to have the directory on the Python package path when requirements.txt is processed (not when my code runs) so that any dependences encountered there can be fulfilled with packages I've installed already in that directory, rather than being (slowly) installed again. – orome Jan 24 '15 at 23:25
  • Why dont you just not include the modules in your requirements.txt? – Nick Humrich Jan 24 '15 at 23:26
  • It's matplotlib, scipy, numby, etc. They take up to 45 minutes to build. – orome Jan 24 '15 at 23:31
  • The requirements.txt is basically a "what do you want me to install for you" file. If your installing those modules in other ways, just dont put them in your requirements.txt and they wont be re-installed. – Nick Humrich Jan 24 '15 at 23:33
  • @NickHumrich: See [this](http://stackoverflow.com/questions/15722641/how-to-install-matplotlib-on-elastic-beanstalk/15881797#15881797) and [this](http://www.zezuladp.com/2014/10/scaling-numpy-and-scipy-with-django-and.html), for example. – orome Jan 24 '15 at 23:34
  • @NickHumrich: The issue is dependences though. If something in requirements.txt depends on some chunk of the science stack, that could be installed as part of the listed item in requirements.txt (even if it's not listed there, [right](http://stackoverflow.com/q/28121318/656912))? If the AWS process exclusively installed what's explicitly listed in requirements.txt — that's great, and I'm done! – orome Jan 24 '15 at 23:35
  • I fail to see the issue. Yum does the same thing pip does, it installs all sub-dependencies. So if you yum installed modules, you dont need those modules in requirements.txt – Nick Humrich Jan 24 '15 at 23:40
  • Have you tried not putting the modules in requirements.txt? – Nick Humrich Jan 24 '15 at 23:41
  • @NickHumrich: The issue is that I will need to check the dependencies of every package I install to see if it depends on anything in the science stack, if it does I then need to install it somewhere in a context that sees the packages I've already installed via Anaconda. There are three possibilities (a) using yum (can I install the same set of things through yum that I can through requirements.txt; does yum see the Anaconda packages?); (b) using pip in `commands` (does pip there see the Anaconda packages?); using requirements.txt (which certainly doesn't see Anaconda). – orome Jan 24 '15 at 23:48
  • @NickHumrich: Method (a) may not have access to a given package, method (c) will have, but can't see Anaconda – will method (b) work? If not, is there a way to get method (c) to see Anaconda? – orome Jan 24 '15 at 23:49
  • Im starting to see the problem now. You have extra dependencies that also depend on your yum installed stuff. – Nick Humrich Jan 25 '15 at 00:07
  • @NickHumrich: They depend on [Conda-installed things](https://gist.github.com/orome/696084ac60a58b140304) mostly, in their own site-packages directory. So for example if I have a package that depends heavily on the science stack and I list that in requirements.txt, the installation process will have no idea that all of the decencies are already satisfied (in the Anaconda directory) and will attempt to install them again (which *must* be avoided, because of the issues descried in the linked posts). – orome Jan 25 '15 at 00:20
  • @NickHumrich: I've [added to my question](http://stackoverflow.com/posts/28126424/revisions) to illustrate roughy one approach that shows what I'm trying to do (but fails, and probably isn't the right way to go about it). – orome Jan 25 '15 at 15:13
  • @NickHumrich: I *think I might have it* (yes, it's taken me this long!). In the Console I simply set `PYTHONPATH` to `$PYTHONPATH:/anaconda/lib/python2.7/site-packages`. This works for my app (it no longer needs `sys.path.append`). So the question now becomes: does `requirements.txt` run with the same environment variables as my app (i.e., does it see `/opt/python/current/env`)? – orome Jan 25 '15 at 23:43
  • @NickHumrich: If the answer is yes, then everything will work. Even better, if `requirements.txt` is also run after any `aws:elastic beanstalk:application:environment` `container_command` setting is applied, then I can extend the `PYTHONPATH` in this way in my `.ebextensions` `config` files (rather than in the Console). – orome Jan 29 '15 at 18:14

1 Answers1

0

The simplest way to ensure that Anaconda is on the Python modules search path early in the deployment process and for all users is to add a .pth file to the system site packages directory in a an .ebextensions file:

files:
  "/opt/python/run/venv/lib/python2.7/site-packages/anaconda.pth":
    mode: "000644"
    owner: root
    group: root
    content: |
      /anaconda/lib/python2.7/site-packages

This has the added advantage of placing Anaconda after the files in the system site packages directory (PYTHONPATH would place them before).

orome
  • 45,163
  • 57
  • 202
  • 418