Is there a quick way to download all available packages for python?

Question

I need to download as many python packages as possible for my machine because it will not have access to the internet. Right now, I downloaded common packages such as pandas, matplotlib, numpy, xlrd, xlwt, etc. I am afraid that I will need some other packages later. Is there a quick way to install as many packages as possible?

I am using PyCharm with Anaconda which already has some pre-installed packages. The way I am downloading the packages is to go to Setting/Project/Python Interpreter and click on the + sign.

There are a *LOT* of packages published on pypi. Not sure of the feasibility of downloading them all (time-wise, size-wise). You need to predict which ones you need, which depends on what project/s you are/will be working on. We can't predict the future (AFAIK), one option could be downloading the most popular downloaded packages (https://pypistats.org/top) — Gino Mempin, Dec 20 '20 at 03:27
Are you saying you will be developing python code with no access to the Internet, but don't know what kind of code you will be developing? Once your application is deployed, you should know the dependencies, and not need the internet. — RufusVS, Dec 20 '20 at 04:03
Be aware that "all Python packages", even when just looking at pypi, includes deprecated and conflicting packages.Since you explicitly want to install all those package, this is practically impossible. — MisterMiyagi, Dec 20 '20 at 08:56

Gino Mempin · Answer 1 · 2020-12-20T08:57:53.360

The more important question is: Should you be downloading "as many python packages as possible"? Would it be a feasible and practical solution? I think not, because of the following reasons:

There are a LOT of python packages available on PyPi, which is where most of the packages are hosted. As I am writing this answer, there are "278,688 projects" available. (NOTE: I know there are other sites where you can download packages, but for the purposes of this answer, let's focus on just PyPi).
It is not enough to download just the latest version of each package, because some packages depend on a specific version of some other package. So you will have to download those as well. For example, pandas 1.1.5 depends on NumPy 1.15.4 among other dependencies.
Not all packages are compatible with your Python version (ex. some are Python 2 only and you are working on Python 3), your OS version (ex. requires Windows APIs or Linux APIs), or some other env-specific configuration (ex. needs gcc to compile). So, you might need to download some other things to get each package to work.
As mentioned in Klaus D's. comment, you also need the documentation for each downloaded package. You will need them as a reference for package usage and for solving any issues/errors. You can only hope that the package APIs have proper __doc__ so that you can use help(module.function) or your IDE can show to you with intellisense.

With those considerations in mind, there is a way to attempt to "download them all" from PyPi, assuming you have the time, the network bandwidth, and the disk capacity to store them all on your machine. You can:

Send a GET request to the PyPi index: https://pypi.python.org/simple/

Parse each of the package links:

<a href=/simple/packagename>packagename</a>

Parse the packagename from the link
Install it with pip (or with conda or whichever package manager you use)
```
pip install packagename
```

Here's a sample Python script:

# Dependencies: pip install requests beautifulsoup4
# Tested on Python3.8.6, beautifulsoup4==4.9.3, requests==2.25.1

import random
import requests
import subprocess
from bs4 import BeautifulSoup

pypi_index = 'https://pypi.python.org/simple/'
print(f'GET list of packages from {pypi_index}')
try:
    resp = requests.get(pypi_index, timeout=5)
except requests.exceptions.RequestException:
    print('ERROR: Could not GET the pypi index. Check your internet connection.')
    exit(1)

print(f'NOW parsing the HTML (this could take a couple of seconds...)')
try:
    soup = BeautifulSoup(resp.text, 'html.parser')
    body = soup.find('body')
    links = (pkg for pkg in body.find_all('a'))
except:
    print('ERROR: Could not parse pypi HTML.')
    exit(1)

# As a demo, I'm just going to install 5 random packages
# If you *really* want to install them all, remove this
# limit and the sampling of 'list(links)'
install_limit = 5
some_of_the_links = random.sample(list(links), install_limit)

for link in some_of_the_links:
    pkg_name = link['href'].split('/')[-2]
    cmd = f'pip install {pkg_name}'  # Replace with `conda` for Anaconda
    print("=" * 30)
    print(f'NOW installing "{pkg_name}"')
    try:
        subprocess.run(cmd.split(), check=True)
    except subprocess.CalledProcessError:
        print(f'ERROR: Failed to install {pkg_name}')
        continue

Note that I've limited the script to install just 5 random packages. Remove the install_limit to really install them all, but note that not every installation will be successful because, as I've said at the start, some are broken or not compatible with your system or not compatible with each other.

The other alternatives to "downloading them all" are:

Option 1

You have to properly plan for what Python project you are/will be working on. While we can't predict the future, we can certainly do research on what we might possibly need. For example, you plan to work with Excel files, then search for reading excel files in Python then download the commonly mentioned ones. If you plan to work on a Machine Learning model, look for tutorials and take note of the packages they use there.

Option 2

You can query the most downloaded packages from PyPi here: https://pypistats.org/top. If you are not satisfied with that list, try using the PyPi Stats API to get a more fine-tuned list.

Option 3

You can pull pre-built Docker images with pre-installed Python packages (see sample below). For example, there is this datascience-notebook, which "includes libraries for data analysis from the Julia, Python, and R communities.". For web applications, there is this tiangolo/uvicorn-gunicorn-fastapi for building web applications with the Uvicorn-Gunicorn-FastAPI stack. There are many more, depending on the use-case. You can use those images as a reference for which Python packages you need or use them directly as your development environment.

$ docker pull jupyter/datascience-notebook
$ docker run -it jupyter/datascience-notebook bash
(base) jovyan@fdaf7dd9db33:~$ pip list
Package                       Version
----------------------------- -------------------
alembic                       1.4.3
argon2-cffi                   20.1.0
async-generator               1.10
attrs                         20.3.0
backcall                      0.2.0
backports.functools-lru-cache 1.6.1
beautifulsoup4                4.9.3
bleach                        3.2.1
blinker                       1.4
bokeh                         2.2.3
Bottleneck                    1.3.2
...

Is there a quick way to download all available packages for python?

1 Answers1