Explain why Python virtual environments are “better”?

Question

I have yet to come across an answer that makes me WANT to start using virtual environments. I understand how they work, but what I don’t understand is how can someone (like me) have hundreds of Python projects on their drive, almost all of them use the same packages (like Pandas and Numpy), but if they were all in separate venv’s, you’d have to pip install those same packages over and over and over again, wasting so much space for no reason. Not to mention if any of those also require a package like tensorflow.

The only real benefit I can see to using venv’s in my case is to mitigate version issues, but for me, that’s really not as big of an issue as it’s portrayed. Any project of mine that becomes out of date, I update the packages for it.

Why install the same dependency for every project when you can just do it once for all of them on a global scale? I know you can also specify —-global-dependencies or whatever the tag is when creating a new venv, but since ALL of my python packages are installed globally (hundreds of dependencies are pip installed already), I don’t want the new venv to make use of ALL of them? So I can specify only specific global packages to use in a venv? That would make more sense.

What else am I missing?

UPDATE

I’m going to elaborate and clarify my question a bit as there seems to be some confusion.

I’m not so much interested in understanding HOW venv’s work, and I understand the benefits that can come with using them. What I’m asking is:

Why would someone with (for example) have 100 different projects that all require tensorflow to be installed into their own venv’s. That would mean you have to install tensorflow 100 separate times. That’s not just a “little” extra space being wasted, that’s a lot.
I understand they mitigate dependency versioning issues, you can “freeze” packages with their current working versions and can forget about them, great. And maybe I’m just unique in this respect, but the versioning issue (besides the obvious difference between python 2 and 3) really hasn’t been THAT big of an issue. Yes I’ve run into it, but isn’t it better practise to keep your projects up to date with the current working/stable versions than to freeze them with old, possibly no longer supported versions? Sure it works, but that doesn’t seem to be the “best” option to me either.

To reiterate on the second part of my question, what I would think is, if I have (for example) tensorflow installed globally, and I create a venv for each of my 100 tensorflow projects, is there not a way to make use of the already globally installed tensorflow inside of the venv, without having to install it again? I know in pycharm and possibly the command line, you can use a — system-site-packages argument (or whatever it is) to make that happen, but I don’t want to include ALL of the globally installed dependencies, cuz I have hundreds of those too. Is —-system-site-packages -tensorflow for example a thing?

Hope that helps to clarify what I’m looking for out of this discussion because so far, I have no use for venv’s, other than from everyone else claiming how great they are but I guess I see it a bit differently :P

(FINAL?) UPDATE

From the great discussions I've had with the contributors below, here is a summation of where I think venv's are of benefit and where they're not:

USE a venv:

You're working on one BIG project with multiple people to mitigate versioning issues among the people
You don't plan on updating your dependencies very often for all projects
To have a clearer separation of your projects
To containerize your project (again, for distribution)
Your portfolio is fairly small (especially in the data science world where packages like Tensorflow are large and used quite frequently across all of them as you'd have to pip install the same package to each venv)

DO NOT use a venv:

Your portfolio of projects is large AND requires a lot of heavy dependencies (like tensorflow) to mitigate installing the same package in every venv you create
You're not distributing your projects across a team of people
You're actively maintaining your projects and keeping global dependency versions up to date across all of them (maybe I'm the only one who's actually doing this, but whatever)

As was recently mentioned, I guess it depends on your use case. Working on a website that requires contribution from many people at once, it makes sense to all be working out of one environment, but for someone like me with a massive portfolio of Tensorflow projects, that do not have versioning issues or the need for other team members, it doesn't make sense. Maybe if you plan on containerizing or distributing the project it makes sense to do so on an individual basis, but to have (going back to this example) 100 Tensorflow projects in your portfolio, it makes no sense to have 100 different venv's for all of them as you'd have to install tensorflow 100 times into each of them, which is no different than having to pip install tensorflow==2.2.0 for specific old projects that you want to run, which in that case, just keep your projects up to date.

Maybe I'm missing something else major here, but that's the best I've come up with so far. Hope it helps someone else who's had a similar thought.

Ever try working on a team? Or with people who don't necessarily have the best grasp on how Python dependencies work? — 0x263A, Jul 02 '22 at 00:11
[related](https://stackoverflow.com/questions/41972261/what-is-a-virtualenv-and-why-should-i-use-one) — 0x263A, Jul 02 '22 at 00:28
@0x263A again, that doesn’t really answer my question. I know how virtual environments “work” in that it mitigates versioning issues, but why have a venv for every one of your projects that all have tensorflow installed on them? That’s ridiculous — wildcat89, Jul 02 '22 at 01:05
I don't personally use venv's either, but I also don't maintain projects that have been deployed over the course of over a decade. I work on projects small enough that the cost of updating to the newest versions of everything is not too big a problem. They probably also aren't useful to you, but they are critical to many others. — Aaron, Jul 02 '22 at 01:27
@Aaron thank you, I’m starting to think I’m in a similar boat. I have worked with teams where having a venv would make sense, but I guess my scope of work is similar to what you’ve described. The motivation for my question came from wanting to deploy my django app to pythonanywhere for my little portfolio site, but the guide is tailored towards venv’s so I was like uggghhh why! :’D thanks for the clarification. I’ve updated my question above if you have an answer for my specific syntax question, but thanks for your input! — wildcat89, Jul 02 '22 at 01:30
The point is to isolate dependencies. If you don't require it, then there isn't much point. But most people will work on projects with varies dependencies. Frankly, it sounds like you have *one project* — juanpa.arrivillaga, Jul 02 '22 at 01:53
Ah I see why that'd throw you for a loop. I'm gambling that you primarily do data science (?) where a venv wouldn't be as useful as it is for django which could conceivably be the a significant component of a company's product/service that you *really* don't want to break — 0x263A, Jul 02 '22 at 01:54
@juanpa.arrivillaga LOL sure dude, if I only had one project, than I could see the benefit to using them! — wildcat89, Jul 02 '22 at 01:59
@0x263A That's exactly it, I'm not working in one environment with 10's or more people also working on the same thing. I think the real benefit is to freeze the dependencies so you can share the work with others without issue, but to install tensorflow 100 times in different env's for my use doesn't really make much sense. Thanks for your input! — wildcat89, Jul 02 '22 at 02:00
If you have 100 "projects" (probably a bunch of jupyter notebooks?) and they all use your global installation of Python then some would say you have a singular Python installation (re: project) that has a bunch of modules scattered about memory. — 0x263A, Jul 02 '22 at 02:09
Fair, although mine are generally in script format as jupyter notebooks get too slow/RAM heavy when they get too long. Someone mentioned that if you have 100 packages that all require a different version of TF, any time you wanted to run one, you'd have to globally replace TF for that one package to run, and I get that, but that's not really much different than pip installing it for each venv in the end anyway, and that's not what I do anyway, any package that I'm recently working on runs with the newest versions of dep's anyway, so I think for me (so far) they don't add much benefit. — wildcat89, Jul 02 '22 at 02:13
FWIW, certain virtual environment managers, e.g. `conda` will cache dependencies. — juanpa.arrivillaga, Jul 02 '22 at 02:22
And my point is, what you are calling a "project" might not be what people mean when they say project. It sounds like you have *a hundred scripts* all part of the same project. This is obviously a matter of terminology — juanpa.arrivillaga, Jul 02 '22 at 02:24
@juanpa.arrivillaga Well no, they are not all part of the same project. I have separate "projects" that all have their own separate scripts for running. I would call those "projects", yes? Simple example, I have a Boston Housing Market Price Regression project, and an Iris Classification project. They're completely separate "projects", but nonetheless they both require Tensorflow. In my case, I would NOT use a venv for each of them cuz I'd have to install Tensorflow for each "project", rather than just using one global install. This is a small example, but extrapolate that to 100 projects... — wildcat89, Jul 02 '22 at 02:28
@juanpa.arrivillaga I think you get the idea. It just doesn't make sense for what I am doing in Python. When I start working across bigger projects with many team members, it makes sense, but I'm fairly diligent in keeping my recent projects up to date with current dependencies so I don't need em! — wildcat89, Jul 02 '22 at 02:29
My point is that I think you are thinking of a "project" in a different sense here. Let me put it this way, I would have one repo with all these scripts with a single environment definition — juanpa.arrivillaga, Jul 02 '22 at 02:30
I must not be understanding. How is having two completely separate Tensorflow projects, that have different datasets, NOT dependent on each other in any way, and (if you were using venv's for them) have their own separate venv's, the same "project"??? — wildcat89, Jul 02 '22 at 02:33

0x263A · Accepted Answer · 2022-07-06T16:31:57.263

I'm a data scientist and sometimes I run into these things called "virtual environments" and I don't get what the use case is? I already have all of these packages and modules and widgets downloaded! Why should I set up a separate place where I manage all of the stuff I'm already managing globally?

Python is a very powerful tool. In this answer consider two such ways to swing the metaphorical hammer:

Data Science
Software Engineering

For a data scientist (working alone) using Python to write a poc for a research paper, make a lstm nn, or predict the price of TSLA dependent on the frequency of Elon Musk's tweets all that really matters is being able to use the best library (tensorflow, pytorch, sklearn, ...) for whatever task they're trying to get done. In whatever directory they're working in when they need it. It is very tempting to use one global Python installation and just use the same stuff everywhere. Frankly, this is probably fine. As it's just one person managing their own space. So the configuration of their machine would be one single Python environment and everything, everywhere uses it. Or if the data scientist wanted to they could have a single directory that contains a virtual environment and some sub directories containing all the scripts (projects) they work on.

Now consider a software engineer who has multiple git repos with complete CI/CD pipelines that each build into separate entities that then get deployed to some cloud environment. Them and the 9 other people on their team need to be able to be sure that they are all making changes that won't break any piece of the code. For example in Python 3.6 the function dict.popitem subtly changed from returning a random element in a dict to LIFO order guaranteed. It's pretty easy to see that that could cause issues if Jerry had implemented a function that relies on the original random nature of the function and Bob implemented a function with the LIFO behavior guaranteed. This team of engineers would have git repos that each contain a single virtual environment (a single isolated Python environment) that allows them to manage dependencies for that "project".

The data scientist has one Python installation/environment that allows them to do whatever.

The engineer has a Python installation and a bunch of environments so that they can work across multiple repos with multiple people and (hopefully) nothing breaks.

So, how are my projects not considered separate "projects"? XD LOL I'M KIDDING DON'T INDULGE ME. Yes, I see the majority of my work as part of the Data Science umbrella so all of what you mentioned speaks more to me than the engineering side. Great answer, thanks for doing this! — wildcat89, Jul 02 '22 at 03:32

score 1 · Answer 2 · answered Jul 02 '22 at 00:18

I can see where you're coming from with your question. It can seem like a lot of work to set up and maintain multiple virtual environments (venvs), especially when many of your projects might use similar or even the same packages.

However, there are some good reasons for using venvs even in cases where you might be tempted to just use a single global environment. One reason is that it can be helpful to have a clear separation between your different projects. This can be helpful in terms of organization, but it can also be helpful if you need to use different versions of packages in different projects.

If you try to share a single venv among all of your projects, it can be difficult to use different versions of packages in those projects when necessary. This is because the packages in your venv will be shared among all of the projects that use that venv. So, if you need to use a different version of a package in one project, you would need to change the version in the venv, which would then affect all of the other projects that use that venv. This can be confusing and make it difficult to keep track of what versions of packages are being used in which projects.

Another issue with sharing a single venv among all of your projects is that it can be difficult to share your code with others. This is because they would need to have access to the same environment (which contains lots of stuff unrelated to the single project you are trying to share). This can be confusing and inconvenient for them.

So, while it might seem like a lot of work to set up and maintain multiple virtual environments, there are some good reasons for doing so. In most cases, it is worth the effort in order to have a clear separation between your different projects and to avoid confusion when sharing your code with others.

This was the best answer I’ve read so far. The clear separation thing makes sense, and maybe my situation is unique in that I have over (maybe) 100 different projects that use tensorflow, so to put all those into their own venv’s just to separate them doesn’t REALLY seem like a great benefit. Just seemed a heck of a lot easier to have everything installed globally and just keep your projects up to date, than freeze a venv with an old version of something anyway? — wildcat89, Jul 02 '22 at 01:13
…and add to the memory waste issue I guess. But I’ve updated my question above if you wanna have another read through :) if not I understand too haha I feel as though I’m in the minority here, and maybe it’s just because I have no real need for venv’s, I just don’t see them as the best option (yet) — wildcat89, Jul 02 '22 at 01:27

Pepe N O · Answer 3 · 2022-07-02T03:28:49.793

It's the same principle as in monouser vs multiuser, virtualization vs no virtualization, containers vs no containers, monolithic apps vs micro services, etcetera; to avoid conflict, maintain order, easily identify a state of failure, among other reasons as scalability or portability. If necessary apply it, and always keeping in mind KISS philosophy as well, managing complexity, not creating more.

And as you have already mentioned, considering that resources are finite.

Besides, a set of projects that share the same base of dependencies of course that is not the best example of separation necessity. In addition to that, technology evolve taking into account not redundancy of knowingly base of commonly used resources.

score 0 · Answer 4 · answered Jul 02 '22 at 00:21

0

Well, there are a few advantages:

with virtual environments, you have knowledge about your project's dependencies: without virtual environments your actual environment is going to be a yarnball of old and new libraries, dependencies and so on, such that if you want to deploy a thing into somewhere else (which may mean just running it in your new computer you just bought) you can reproduce the environment it was working in
you're eventually going to run into something like the following issue: project alpha needs version7 of library A, but project beta needs library B, which runs on version3 of library A. if you install version3, A will probably die, but you really need to get B working.
it's really not that complicated, and will save you a lot of grief in the long term.

answered Jul 02 '22 at 00:21

Tomas Boncompte

86
1
6

Again, I understand how virtual environments work and their benefits, but why would I want to have, let’s say, 100 different projects, that all require tensorflow, in their own venv’s? That’s not an efficient way of handling your dependencies. Please re-read my question. I’m not looking for the benefits of venv’s. – wildcat89 Jul 02 '22 at 01:09
1

oh. sure, you don't have to have like one *for every distinct project you have*. maybe you just need one, if you do sufficiently similar things. even so, you could, say, move completer projects to safe environments so that libraries you install for new projs don't break your working code or something. – Tomas Boncompte Jul 02 '22 at 04:22
1

I'm glad I wasn't COMPLETELY out to lunch with my thinking haha just needed some direction/clarification on particular use cases, which of course are different for each situation. Thanks for your input!! – wildcat89 Jul 02 '22 at 04:50

score 0 · Answer 5 · answered Jul 02 '22 at 00:36

There are several motivations for venvs, or for their moral equivalent: conda environments.

1. author a package

You create a cool "scrape my favorite site" package which graphs a timeseries of some widget product. Naturally it depends on BeautifulSoup. You happened to have html5lib 1.1 lying around due to some previous project, so you tested with that. A user downloads your scrape-widget package from pypi, happens to have lxml 4.7.1 available, and finds that scraping crashes when using that library. Wouldn't it have been better for your package to specify that user shall run against the same deps that you tested with?

2. use a package

Same scenario, but now you're using someone's scrape-widget package. Author tested with lxml 4.7.1 but you have lxml 4.9.1, which behaves differently, and this makes the app behave differently, crashing in ways the author never saw.

3. use two packages

You want to run both scrape-frobozz-magic-widgets and scrape-acme-widget. Their authors tested using different versions of requests, and of lxml. Changing dep changes the app behavior. You can only use one or the other, unless you're willing to re-run pip quite frequently.

4. collaborate on a team

You write code that has deps. So does your colleague. You have to coordinate things, so testing on one laptop instills confidence the test would succeed on other laptops.

5. use CI

You have a teammate named Jenkins, and want to communicate to him that you used a specific version of a dep when you saw the test succeed.

6. get a new laptop

Things were working. Then your laptop exploded, you got a new one, and you (quickly) want to see things work again. Some of your deps were downrev, due to recently released bugs and breaking changes. Reading a file full of dep versions from your github repo lets you immediately reproduce the state of the world back when things were working.

Again, I understand how virtual environments work and their benefits, but why would I want to have, let’s say, 100 different projects, that all require tensorflow, in their own venv’s? Please re-read my question. I’m not looking for the benefits of venv’s. — wildcat89, Jul 02 '22 at 01:08
Given 100 projects, which all have deps, some will have incompatibilities, one needs TF 2.6.2 and another needs 2.9.1. If you like re-running `pip` each time you switch between those projects, fine, don't use venv. Most common way for that situation to arise is that one project and/or its dep is "stuck in time", five years old, no maintainer, and it cannot accept modern breaking change in one of its deps. So _all_ its deps are years old. Works great, until you want to run a project that requires a recently added TF feature. — J_H, Jul 02 '22 at 01:31
I understand, and I appreciate your input! I just see it differently. I’m not concerned with the versioning issue AT ALL. I don’t re-run pip for each project either, most of my projects are kept up to date globally anyway, maybe I’m literally the only one who does that lol still doesn’t make sense to me to waste all that disk space installing 100 different tensorflow installations, all with potentially different versions, just to freeze it on an old version. My catalogue is fairly big and I haven’t had issues keeping things up to date I guess? — wildcat89, Jul 02 '22 at 01:35