14

I have looked around and do not see a solution for this. What I would like to do is get a list of all packages available in Python at run-time.

I looked at these:

But they are not what I am looking for.

I attempted to do this:

import pkgutil
for pkg in pkgutil.walk_packages():
    print(pkg)  # or do something with them...

However, when I do this:

import sys
sys.modules.keys()​​​

It appears that I have loaded all the packages which is not what I want to do, what I want is a list of strings of all packages+modules available to the current Python installation without loading them all when I do it.

Boris Verkhovskiy
  • 14,854
  • 11
  • 100
  • 103
Tom Myddeltyn
  • 1,307
  • 1
  • 13
  • 27
  • I doubt it's possible without importing, as importing can trigger all kind of hooks and there is no guarantee that files match modules or the other way around (it's possible to have hooks that download modules on the fly or extract them from zip files for instance). That's even what `walk_packages` hints at with *“Note that this function must import all packages (not all modules!) on the given path, in order to access the `__path__` attribute to find submodules”*. – spectras Jun 10 '16 at 15:47
  • (or, said more concisely: it is not possible to know what submodules are in a package until the package has been fully loaded) – spectras Jun 10 '16 at 15:51
  • I don't think what your asking to do is possible. For example, if you had the `six` third party package installed, how could you know that `six.moves.*` modules existed without importing it? – pppery Jun 10 '16 at 15:53
  • 1
    Ok, what if I back it up a bit, how about just the top-level packages and not the sub-modules? – Tom Myddeltyn Jun 10 '16 at 15:54
  • Should be doable, but I don't think an API exists for that. I'd be curious to see if someone has an answer to this. – spectras Jun 10 '16 at 16:05
  • In my system, the accepted answer does not include `sys`, `math`, and quite a few others in the output. See answer below. – sancho.s ReinstateMonicaCellio Jan 08 '20 at 09:35

3 Answers3

17

Alright, I was curious, and I digged a bit into pkgutil, and I came up with this, which is much simpler than I expected:

list(pkgutil.iter_modules())

It lists all top-level packages/modules available either as regular files or zip packages, without loading them. It will not see other types of packages though, unless they properly register with the pkgutil internals.

Each returned entry is a 3-tuple with the items:

  1. module_finder: The file finder instance that found the module
  2. name: The name of the module
  3. ispkg: A boolean specifying whether it is a regular module or a package.

Example 3-tuple:

(FileFinder('/usr/lib/python3/dist-packages'), 'PIL', True)

And I can confirm that this did not load the PIL package:

In [11]: sys.modules['PIL']
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-11-b0fc0af6cc34> in <module>()
----> 1 sys.modules['PIL']

KeyError: 'PIL'
Mateen Ulhaq
  • 24,552
  • 19
  • 101
  • 135
spectras
  • 13,105
  • 2
  • 31
  • 53
  • Thanks! I should have tried that. I assumed that since the `walk_packages()` imported everything that so would the `iter_modules()` :-) – Tom Myddeltyn Jun 10 '16 at 17:03
  • Nice find, spectras! If you just want to list the package name, you'd refer to the tuple by its `[1]` index. – Jeremy Jun 10 '16 at 17:04
  • 1
    @busfault> I assumed the same, but reading the `pkgutil` internals I realized it would only need to load for inspecting the submodules. (it needs to check whether there is a `__path__` in Nth-level module before listing N+1th-level submodules). So I gave it a try, while checking what got loaded and what didn't, and… here it is! – spectras Jun 10 '16 at 17:09
  • In my system, this does not include `sys`, `math`, and quite a few others in the output. See answer below. – sancho.s ReinstateMonicaCellio Jan 08 '20 at 09:34
  • Is there a command to obtain the string `'/usr/lib/python3/dist-packages'`? – WinEunuuchs2Unix Dec 17 '20 at 13:31
4

If you need all available modules, not just the ones that are present as files in all the directories in your sys.path, then you can use (the undocumented) pydoc.ModuleScanner (which unfortunately loads the modules to work):

from pydoc import ModuleScanner
import warnings

def scan_modules():
    """Scans for available modules using pydoc.ModuleScanner, taken from help('modules')"""
    modules = {}
    def callback(path, modname, desc, modules=modules):
        if modname and modname[-9:] == ".__init__":
            modname = modname[:-9] + " (package)"
        if modname.find(".") < 0:
            modules[modname] = 1
    def onerror(modname):
        callback(None, modname, None)
    with warnings.catch_warnings():
        warnings.simplefilter("ignore")  # ignore warnings from importing deprecated modules
        ModuleScanner().run(callback, onerror=onerror)
    return modules

modules = list(scan_modules().keys())
print(sorted(modules))

The problem with pkgutil.iter_modules is that it doesn't return all packages, only the ones that are files or directories, but CPython loads a few modules in other special ways that can't be detected by just looking at the files.

The problem with ModuleScanner is that it returns all available modules, including the ones you've pip installed. If you only need the packages that come with Python, then you can start Python with the -I command line option.

Here are the packages that pkgutil doesn't find on Python 3.9:

$ python3.9 -I
Python 3.9.0+ (default, Oct 19 2020, 09:51:18) 
[GCC 10.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> modules = ...  # paste the above code
...
>>> import pkgutil
>>> sorted(set(modules) - {m.name for m in pkgutil.iter_modules()})
['_abc', '_ast', '_bisect', '_blake2', '_codecs', '_collections', '_csv', 
'_datetime', '_elementtree', '_functools', '_heapq', '_imp', '_io', 
'_locale', '_md5', '_operator', '_peg_parser', '_pickle', 
'_posixsubprocess', '_random', '_sha1', '_sha256', '_sha3', '_sha512', 
'_signal', '_socket', '_sre', '_stat', '_statistics', '_string', 
'_struct', '_symtable', '_thread', '_tracemalloc', '_warnings', 
'_weakref', 'array', 'atexit', 'binascii', 'builtins', 'cmath', 'errno', 
'faulthandler', 'fcntl', 'gc', 'grp', 'itertools', 'marshal', 'math', 
'posix', 'pwd', 'pyexpat', 'select', 'spwd', 'sys', 'syslog', 'time', 
'unicodedata', 'xxsubtype', 'zlib']

This answer is based on @sancho.s's, except instead of parsing the stdout of help('modules'), I have copy/pasted the code that that function runs (you need to copy/paste it because it prints to stdout, there's no helper function):

Boris Verkhovskiy
  • 14,854
  • 11
  • 100
  • 103
  • This code is used by RustPython [here](https://github.com/RustPython/RustPython/blob/ef4bea53c6d824c067b916ce3af654d93f280582/whats_left.py#L246) – Boris Verkhovskiy Dec 22 '22 at 06:32
3

I put together a very rough way of getting this list (see below), which appears to be more accurate than pkgutil. See details below.

In addition, I found loaded_modules and list-imports, but I tested none of them.


I have compared the results of my method with the answer by spectras:

  1. All items in the output by spectras (say, modlist2) are in the output here (say, modlist1).
  2. There are quite a few items in modlist1 that are not in modlist2. To my surprise, this difference included modules like sys, math, zlib, etc. In my case, the respective lengths were 390 vs. 327, so the method with pkgutil gives quite incomplete results.

The method to pull the list of available modules consists of:

  1. Capturing output of help into a string
  2. Removing spare text from the captured string
  3. Splitting multicolumn output

Code is here:

def modules_list() :
    """Return a list of available modules"""
    import sys
    # Capture output of help into a string
    import io
    stdout_sys = sys.stdout
    stdout_capture = io.StringIO()
    sys.stdout = stdout_capture
    help('modules')
    sys.stdout = stdout_sys
    help_out = stdout_capture.getvalue()
    # Remove extra text from string
    help_out = help_out.replace('.', '')
    help_out = help_out.replace('available modules', '%').replace('Enter any module', '%').split('%')[-2]
    # Split multicolumn output
    help_out = help_out.replace('\n', '%').replace(' ', '%').split('%')
    help_out = list(filter(None, help_out))
    help_out.sort()
    return help_out
  • 1
    Yes, the pkgutil version will only return modules that are present as files. So the modules that come builtin, compiled right into python won't be included. This is usually not a problem as such scripts are typically used for dependency scanning, and you don't care about builtins in such case — grats on hitting 10k btw ;). – spectras May 19 '20 at 14:23
  • If you look where `help('modules')` is implemented, https://github.com/python/cpython/blob/63298930fb531ba2bb4f23bc3b915dbf1e17e9e1/Lib/pydoc.py#L2178-L2204 you can just copy/paste that code and get the output in a list, parsing the stdout of a function call is hacky. – Boris Verkhovskiy Jan 23 '21 at 02:14