156

I install a lot of the same packages in different virtualenv environments. Is there a way that I can download a package once and then have pip install from a local cache?

This would reduce download bandwidth and time.

Matthew Rankin
  • 457,139
  • 39
  • 126
  • 163
  • 1
    Note that as of pip 6.0 (2014-12-22), pip will cache by default. See https://pip.pypa.io/en/stable/reference/pip_install.html#caching for details. – Pi Delport Feb 24 '15 at 08:04
  • It doesn't just reduce download bandwidth time, it also can eliminate the time spent crawling the PyPI index to check available versions of packages, and if you are caching wheels, it can eliminate time spent building wheels for packages that don't provide them. It adds up to a very substantial speed boost. – Jonathan Hartley Feb 05 '19 at 13:49

11 Answers11

130

Updated Answer 19-Nov-15

According to the Pip documentation:

Starting with v6.0, pip provides an on by default cache which functions similarly to that of a web browser. While the cache is on by default and is designed do the right thing by default you can disable the cache and always access PyPI by utilizing the --no-cache-dir option.

Therefore, the updated answer is to just use pip with its defaults if you want a download cache.

Original Answer

From the pip news, version 0.1.4:

Added support for an environmental variable $PIP_DOWNLOAD_CACHE which will cache package downloads, so future installations won’t require large downloads. Network access is still required, but just some downloads will be avoided when using this.

To take advantage of this, I've added the following to my ~/.bash_profile:

export PIP_DOWNLOAD_CACHE=$HOME/.pip_download_cache

or, if you are on a Mac:

export PIP_DOWNLOAD_CACHE=$HOME/Library/Caches/pip-downloads

Notes

  1. If a newer version of a package is detected, it will be downloaded and added to the PIP_DOWNLOAD_CACHE directory. For instance, I now have quite a few Django packages.
  2. This doesn't remove the need for network access, as stated in the pip news, so it's not the answer for creating new virtualenvs on the airplane, but it's still great.
Matthew Rankin
  • 457,139
  • 39
  • 126
  • 163
  • 4
    Maybe better idea is to put it into .bashrc, because bash_profile is executed only during login. That's up to you, and anyway it's a good advice :) – Nikita Hismatov May 24 '12 at 09:31
  • 1
    On macs it is loaded at the beginning of any shell. – saul.shanabrook Jul 13 '12 at 13:16
  • 3
    PIP_DOWNLOAD_CACHE is seriously flawed and I wouldn't recommend using it for things like getting packages out to your deployment machines. It also still relies on pypi.python.org being reachable. Great for a local development cache, but not suitable for heavier uses. – slacy Sep 25 '12 at 18:35
  • 1
    @slacy Could you comment on why it is seriously flawed? If you don't want PyPI to be reachable, that's what --no-index is for; a download cache is surely orthogonal to reaching PyPI or not! – lvh Dec 01 '13 at 11:51
  • @lvh [slacy's answer below](http://stackoverflow.com/a/12147405/648162) explains why Pip's download cache is flawed. I've also seen pip install taking longer with cache enabled, bizarrely. [pip-accel](http://stackoverflow.com/a/26162326/648162) and [basket](http://stackoverflow.com/a/22576769/648162) appear to be better options. – qris Dec 23 '14 at 11:56
  • If you are using this method, please use the XDG cache dir, ie, `export PIP_DOWNLOAD_CACHE="${XDG_CACHE_HOME:-$HOME/.cache}/pip"` – Chris Lamb Nov 12 '15 at 16:20
53

In my opinion, pip2pi is a much more elegant and reliable solution for this problem.

From the docs:

pip2pi builds a PyPI-compatible package repository from pip requirements

pip2pi allows you to create your own PyPI index by using two simple commands:

  1. To mirror a package and all of its requirements, use pip2tgz:

    $ cd /tmp/; mkdir package/
    $ pip2tgz packages/ httpie==0.2
    ...
    $ ls packages/
    Pygments-1.5.tar.gz
    httpie-0.2.0.tar.gz
    requests-0.14.0.tar.gz
    
  2. To build a package index from the previous directory:

    $ ls packages/
    bar-0.8.tar.gz
    baz-0.3.tar.gz
    foo-1.2.tar.gz
    $ dir2pi packages/
    $ find packages/
    /httpie-0.2.0.tar.gz
    /Pygments-1.5.tar.gz
    /requests-0.14.0.tar.gz
    /simple
    /simple/httpie
    /simple/httpie/httpie-0.2.0.tar.gz
    /simple/Pygments
    /simple/Pygments/Pygments-1.5.tar.gz
    /simple/requests
    /simple/requests/requests-0.14.0.tar.gz
    
  3. To install from the index you built in step 2., you can simply use:

    pip install --index-url=file:///tmp/packages/simple/ httpie==0.2
    

You can even mirror your own index to a remote host with pip2pi.

K Z
  • 29,661
  • 8
  • 73
  • 78
  • +1 pip2pip works great!! I don't like relying on network connectivity that much. It fails when you most need it. – MGP Jun 13 '13 at 16:33
  • this works great, it answers my question http://stackoverflow.com/questions/18052217/how-to-create-local-own-pypi-repository-index-without-mirror/ , can yon answer there as well ? – Larry Cai Aug 06 '13 at 00:45
  • 1
    Maybe it was implied, but it's worth mentioning explicitly: `pip2tgz` detects if you have already downloaded the package to the designated directory, so if you run the same install line or several install lines that have overlapping dependencies, it will only download each package once. – clacke Mar 21 '14 at 23:33
35

For newer Pip versions:

Newer Pip versions now cache downloads by default. See this documentation:

https://pip.pypa.io/en/stable/topics/caching/

For older Pip versions:

Create a configuration file named ~/.pip/pip.conf, and add the following contents:

[global]
download_cache = ~/.cache/pip

On OS X, a better path to choose would be ~/Library/Caches/pip since it follows the convention other OS X programs use.

Flimm
  • 136,138
  • 45
  • 251
  • 267
  • And If I wanted to store them globally for other users of the same PC to access? How would I do that? I figure the config file would have to be placed in */etc* or something. – Batandwa Jan 07 '14 at 19:42
  • @batandwa: That might work. If not, you could try this: make sure that all the users have a `pip.conf` with a `download_cache` setting that points to the same system-wide directory. – Flimm Jan 07 '14 at 20:29
  • The link in the answer is not directly working anymore. This is probably the new location: https://pip.pypa.io/en/stable/topics/caching/ – luator Apr 28 '22 at 06:43
  • @luator Thanks, I've edited the answer. (Feel free to edit the answer yourself in the future, for edits like this.) – Flimm Apr 28 '22 at 08:24
31

PIP_DOWNLOAD_CACHE has some serious problems. Most importantly, it encodes the hostname of the download into the cache, so using mirrors becomes impossible.

The better way to manage a cache of pip downloads is to separate the "download the package" step from the "install the package" step. The downloaded files are commonly referred to as "sdist files" (source distributions) and I'm going to store them in a directory $SDIST_CACHE.

The two steps end up being:

pip install --no-install --use-mirrors -I --download=$SDIST_CACHE <package name>

Which will download the package and place it in the directory pointed to by $SDIST_CACHE. It will not install the package. And then you run:

pip install --find-links=file://$SDIST_CACHE --no-index --index-url=file:///dev/null <package name> 

To install the package into your virtual environment. Ideally, $SDIST_CACHE would be committed under your source control. When deploying to production, you would run only the second pip command to install the packages without downloading them.

slacy
  • 11,397
  • 8
  • 56
  • 61
  • Gabriel -- It's not downloaded twice, just once in the first step and then installed from local cache in the second. What are you seeing? – slacy Sep 25 '12 at 18:34
  • If I run the first step twice, it'll download it twice, right? At least it happened here. I'll need to know that the first step has been executed for this package at least once before executing it, otherwise it'll download the same file twice. How can I check either if I need to execute it or it has been downloaded before? – Gabriel Jordão Sep 25 '12 at 22:49
  • You probably just want to use pip2pi as the other answer suggests. :) – slacy Sep 26 '12 at 16:49
  • does this download the dependencies as well? – monkut Jul 11 '13 at 03:11
  • 1
    I use pip 18.1 and option --no-install is not present. Any idea on how to update this answer? – paolof89 Nov 29 '18 at 17:09
14

Starting in version 6.0, pip now does it's own caching:

  • DEPRECATION pip install --download-cache and pip wheel --download-cache command line flags have been deprecated and the functionality removed. Since pip now automatically configures and uses it’s internal HTTP cache which supplants the --download-cache the existing options have been made non functional but will still be accepted until their removal in pip v8.0. For more information please see https://pip.pypa.io/en/latest/reference/pip_install.html#caching

More information from the above link:

Starting with v6.0, pip provides an on by default cache which functions similarly to that of a web browser. While the cache is on by default and is designed do the right thing by default you can disable the cache and always access PyPI by utilizing the --no-cache-dir option.

Jace Browning
  • 11,699
  • 10
  • 66
  • 90
9

pip wheel is an excellent option that does what you want with the extra feature of pre-compiling the packages. From the official docs:

Build wheels for a requirement (and all its dependencies):

$ pip wheel --wheel-dir=/tmp/wheelhouse SomePackage

Now your /tmp/wheelhouse directory has all your dependencies precompiled, so you can copy the folder to another server and install everything with this command:

$ pip install --no-index --find-links=/tmp/wheelhouse SomePackage

Note that not all the the packages will be completely portable across machines. Some packages will be built specifically for the Python version, OS distribution and/or hardware architecture that you're using. That will be specified in the file name, like -cp27-none-linux_x86_64 for CPython 2.7 on a 64-bit Linux, etc.

hdiogenes
  • 729
  • 7
  • 15
5

Using pip only (my version is 1.2.1), you can also build up a local repository like this:

if ! pip install --find-links="file://$PIP_SDIST_INDEX" --no-index <package>; then
    pip install --download-directory="$PIP_SDIST_INDEX" <package>
    pip install --find-links="file://$PIP_SDIST_INDEX" --no-index <package>
fi

In the first call of pip, the packages from the requirements file are looked up in the local repository (only), and then installed from there. If that fails, pip retrieves the packages from its usual location (e.g. PyPI) and downloads it to the PIP_SDIST_INDEX (but does not install anything!). The first call is "repeated" to properly install the package from the local index.

(--download-cache creates a local file name which is the complete (escaped) URL, and pip cannot use this as an index with --find-links. --download-cache will use the cached file, if found. We could add this option to the second call of pip, but since the index already functions as a kind of cache, it does not necessarily bring a lot. It would help if your index is emptied, for instance.)

user1010997
  • 523
  • 5
  • 13
3

A simpler option is basket.

Given a package name, it will download it and all dependencies to a central location; without any of the drawbacks of pip cache. This is great for offline use.

You can then use this directory as a source for pip:

pip install --no-index -f file:///path/to/basket package

Or easy_install:

easy_install -f ~/path/to/basket -H None package

You can also use it to update the basket whenever you are online.

Burhan Khalid
  • 169,990
  • 18
  • 245
  • 284
  • Limitations (from the official page): Basket downloads source distributions only, it cannot download packages that are not hosted on PyPI and it ignores version requirements (e.g. "nose>=1.1.2"), always downloading the latest version. – hdiogenes Jul 11 '16 at 19:22
3

There is a new solution to this called pip-accel, a drop-in replacement for pip with caching built in.

The pip-accel program is a wrapper for pip, the Python package manager. It accelerates the usage of pip to initialize Python virtual environments given one or more requirements files. It does so by combining the following two approaches:

  • Source distribution downloads are cached and used to generate a local index of source distribution archives.

  • Binary distributions are used to speed up the process of installing dependencies with binary components (like M2Crypto and LXML). Instead of recompiling these dependencies again for every virtual environment we compile them once and cache the result as a binary *.tar.gz distribution.

Paylogic uses pip-accel to quickly and reliably initialize virtual environments on its farm of continuous integration slaves which are constantly running unit tests (this was one of the original use cases for which pip-accel was developed). We also use it on our build servers.

We've seen around 10x speedup from switching from pip to pip-accel.

qris
  • 7,900
  • 3
  • 44
  • 47
0

I think the package "pip-accel" must be a good choice.

0

I found the following to be useful for downloading packages and then installing from those downloads:

pip download -d "$SOME_DIRECTORY" some-package

Then to install:

pip install --no-index --no-cache-dir --find-links="$SOME_DIRECTORY"

Where $SOME_DIRECTORY is the path to the directory that the packages are to be downloaded to.

geogeo
  • 124
  • 2
  • 4