I'm perplexed about how best to use pip
in the face of security concerns about malicious packages or install scripts. I'm not much of a security expert, so I may just be confused (bear with me), but it seems that there are 4, possibly overlapping, approaches:
(1) Use sudo pip
for everything
This is how I do things now. I generally do not need virtualenvs and like the convenience of having all my packages work for all my tools. I also don't install a lot of experimental packages, sticking pretty much to the well-known and widely used ones (matplotlib
, six
, etc).
I gather this can be a risky approach though because the installation process has su
privileges, and could potentially do anything; however it has the advantage of protecting the site-packages
directory from subsequent mischief by anything (not just packages) running as non-su
after an install.
This approach also can't be completely avoided, as some packages (pip
itself) need it to bootstrap any Python installation.
(2) Create a pip
user and give it ownership of site-packages
This would seem to have the advantage of restricting what pip
can do: all it can do is install to site-packages
. But I'm not sure about side effects, or if it would even work (when, for example pip
needs to put things in other locations). A more realistic variant of this is to set things up this way, and use pip
as "pip-user" when it works, and as su
when it doesn't.
(3) Give myself ownership of site-packages
I gather this is a very had idea, but I'm not sure quite why. It would mean that any code I run would be able to tamper with site-packages
; but it would mean that malicious install scripts could only damage things I can damage myself anyway.
(4) "Use a virtualenv"
This suggestion comes up a lot, but I don't see how it helps. It seems no different from 3 to me since it creates a site-packages
that I own.
Which, if any of these approaches, or combinations of approaches, is best for ensuring that pip
does not result in exposing my system? My concern is mostly with my system as a whole, and only secondarily with my Python installation in site-packages
(which I can always rebuild if need be).
Part of the problem I have, is that a don't know how to weigh the risks. An example approach, that seems to make sense to my limited understanding is simply to do (1) for the most part, and use a virtualenv (4) for any package that I worry might damage my site-packages
. Anything I've installed will still be able to damage anything I have access to, but that seems unavoidable, and at least things I don't have access to will be safe (except during the installation process itself). But I have trouble evaluating whether the protection this affords is worth the risk it creates.