2

My layout is as follows:

I have various different python projects under ~/projects, each with the following structure:

~/projects/$project_name/env                      #This is the virtualenv
~/projects/$project_name/scripts                  #This is where the code actually lives
~/projects/$project_name/scripts/requirements.txt #This helps keep track of this project's dependencies

Now, this setup works great as it does the following:

  1. Each project has it's own dependencies in its corresponding env
  2. I can easily redeploy this project somewhere else by cloning the scripts file, creating a new virtualenv and doing pip install -r requirements.txt

The main downside of this setup is that I have multiple copies of the same packages in multiple virtual environments. I regularly end up with a couple of hundred megs for each virtual environment.

My question is:

Is there a way to share packages between multiple virtualenvs?

Things I've tried and do not work:

virtualenv --system-site-packages. This makes the system-wise packages available in the virtualenv but:

  1. it makes it impossible to get a list of specific dependencies
  2. I can't have multiple versions of the same dependency installed (e.g. pandas 0.16 vs pandas 0.15) which I need, as different projects have different needs.

virtualenv --extra-search-dir=/path/to/dist only works for pip, AFAICT, so not good for me.

Dale K
  • 25,246
  • 15
  • 42
  • 71
Henry Henrinson
  • 5,203
  • 7
  • 44
  • 76
  • 1
    I believe that this idea somewhat contradicts the purpose of virtual environments. I would leave it the way it is, personally. Sorry that this is not a direct answer to your question. – Sam Creamer Sep 04 '19 at 13:37
  • Interesting - what makes you say it would contradict the purpose of a virtualenv? You can sort of get this behaviour by making directories in env/lib/python3.5/site-packages be symlinks instead of actual copies of the code. – Henry Henrinson Sep 04 '19 at 13:39
  • is 100MB a problem with the current prices for storage? As long as you don't include your virtual environment in your source control, I don't see the problem – Maarten Fabré Sep 04 '19 at 13:41
  • 1
    @Maarten Yes, it is. – Henry Henrinson Sep 04 '19 at 13:42
  • Duplicate? https://stackoverflow.com/questions/57759298/using-library-installed-in-one-virtual-environment-in-another-virtual-environmen#comment101954901_57759298 – Neil Sep 04 '19 at 13:42
  • 1
    Though I'm inclined to agree that the storage issue is not trivial. I'm also low on space partly due to this. – Neil Sep 04 '19 at 13:44
  • 1
    I think part of the reason this is a bad idea though is that virualenvs aren't just packages they're versions of packages. What happens if you have x package that is version 1.5 in all your modules and then one module needs 1.6? In theory there should be a lot less overlap of dependencies than you seem to have. – Neil Sep 04 '19 at 13:46
  • There's also this: https://stackoverflow.com/questions/50301939/how-to-share-packages-between-virtual-environments-using-conda-or-virtualenv/50303308 – Neil Sep 04 '19 at 13:49
  • 1
    @Neil I agree that virtualenv wants to keep track of individual package versions (and it's useful that it does!). What I don't like is that if I have 3 projects with the same pandas version, I have 3 copies of the code. As a hack, I think you can solve this by putting all the individual pandas versions in a common repo, and setting symlinks "appropriately" in `env/lib/python3.5/site-packages`. What "appropriately" means isn't trivial, but looks doable. – Henry Henrinson Sep 04 '19 at 13:50
  • Cool, makes sense. – Neil Sep 04 '19 at 13:50
  • @Neil Looks like a common request - although not maybe in my exact form. Looks like the answer will turn out to be "virtualenv does not support it" – Henry Henrinson Sep 04 '19 at 13:51
  • 1
    Yes but on reflection I think your question is more about tricking virtualenv and in particular for large packages like pandas. So yea, if I knew an answer I'd probably answer this question. But not the others. – Neil Sep 04 '19 at 13:52
  • @HenryHenrinson I agree that it does seem doable, I just prefer the absoluteness of separate environments. I see virtualenvs as containers, so to speak. It may be more efficient to share resources in some cases to save some disk space, but I can still see some potential conflicts. For example, you may be using the same version of a package in three different projects, but what happens when one of the projects needs a new version? – Sam Creamer Sep 04 '19 at 13:53
  • I believe that this would require a pretty sophisticated virutual environment system, and I do think that it could be developed and could be useful. If you could develop a system that would keep resource use to a minimum while keeping environments logically separate, I would use it! – Sam Creamer Sep 04 '19 at 13:55
  • @SamCreamer If you need a new package version, you can install it just for that package (e.g. add the code to a common repo, and change the proposed symlinks) – Henry Henrinson Sep 04 '19 at 13:55

1 Answers1

1

Scrap the comment, maybe I do know an answer. It appears as though Anaconda's package management system does use symlinks. So that would basically be a virtualenv but with the feature you want. See here: How to free disk space taken up by (ana)conda?

That said, there's a large initial harddisk cost to using Conda, so investigate a bit more and decide if it will work for you.

Neil
  • 3,020
  • 4
  • 25
  • 48