58

I want something like sys.builtin_module_names except for the standard library. Other things that didn't work:

  • sys.modules - only shows modules that have already been loaded
  • sys.prefix - a path that would include non-standard library modules and doesn't seem to work inside a virtualenv.

The reason I want this list is so that I can pass it to the --ignore-module or --ignore-dir command line options of trace.

So ultimately, I want to know how to ignore all the standard library modules when using trace or sys.settrace.

Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
saltycrane
  • 6,541
  • 6
  • 34
  • 43

11 Answers11

34

I brute forced it by writing some code to scrape the TOC of the Standard Library page in the official Python docs. I also built a simple API for getting a list of standard libraries (for Python version 2.6, 2.7, 3.2, 3.3, and 3.4).

The package is here, and its usage is fairly simple:

>>> from stdlib_list import stdlib_list
>>> libraries = stdlib_list("2.7")
>>> libraries[:10]
['AL', 'BaseHTTPServer', 'Bastion', 'CGIHTTPServer', 'ColorPicker', 'ConfigParser', 'Cookie', 'DEVICE', 'DocXMLRPCServer', 'EasyDialogs']
Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
20

Python >= 3.10:

sys.stdlib_module_names

Python < 3.10:

The author of isort, a tool which cleans up imports, had to grapple this same problem in order to satisfy the pep8 requirement that core library imports should be ordered before third party imports.

I have been using this tool and it seems to be working well. You can use the method place_module in the file isort.py:

>>> from isort import place_module
>>> place_module("json")
'STDLIB'
>>> place_module("requests")
'THIRDPARTY'

Or you can get a set of module names directly, which is depending on Python version, for example:

>>> from isort.stdlibs.py39 import stdlib
>>> for name in sorted(stdlib): print(name)
... <200+ lines>
xml
xmlrpc
zipapp
zipfile
zipimport
zlib
zoneinfo
wim
  • 338,267
  • 99
  • 616
  • 750
  • Somehow `sys.stdlib_module_names` still misses stuff like `pkg_resources` – xjcl Aug 07 '23 at 09:48
  • Note that `sys.stdlib_module_names` will not pick up `setuptools` and the related `pkg_resources`. That's because it technically isn't part of the standard library despite coming preinstalled in venvs. – xjcl Aug 07 '23 at 09:58
  • @xjcl There is no reason for `stdlib_module_names` to include `setuptools`/`pkg_resources`. These are not standard library and, from Python 3.12, will no longer be vendored/preinstalled to venvs either. – wim Aug 07 '23 at 15:07
  • Just trying to increase clarity for users who might be surprised. – xjcl Aug 07 '23 at 16:19
18

Why not work out what's part of the standard library yourself?

import distutils.sysconfig as sysconfig
import os
std_lib = sysconfig.get_python_lib(standard_lib=True)
for top, dirs, files in os.walk(std_lib):
    for nm in files:
        if nm != '__init__.py' and nm[-3:] == '.py':
            print os.path.join(top, nm)[len(std_lib)+1:-3].replace(os.sep, '.')

gives

abc
aifc
antigravity
--- a bunch of other files ----
xml.parsers.expat
xml.sax.expatreader
xml.sax.handler
xml.sax.saxutils
xml.sax.xmlreader
xml.sax._exceptions

Edit: You'll probably want to add a check to avoid site-packages if you need to avoid non-standard library modules.

zahypeti
  • 183
  • 1
  • 8
Caspar
  • 7,039
  • 4
  • 29
  • 41
  • `sysconfig.get_python_lib(standard_lib=True)` also gives me the path of my virtualenv which doesn't have all the standard library modules. – saltycrane Jun 24 '11 at 06:20
  • In the case of virtualenv, you can reduce the problem to finding the location of the virtualenv, which [this thread](http://groups.google.com/group/python-virtualenv/browse_thread/thread/e30029b2e50ae17a) suggests can be done using `sys.real_prefix` (although I don't have a virtualenv handy to test on) – Caspar Jun 24 '11 at 06:53
  • Using `sys.real_prefix` with virtualenv is also mentioned in [this SO answer](http://stackoverflow.com/questions/1871549/python-determine-if-running-inside-virtualenv/1883251#1883251) – Caspar Jun 24 '11 at 06:56
  • Using `sys.real_prefix` works inside a virtualenv. But I do want it to work both inside and outside a virtualenv. I guess I just need some logic to decide which prefix to use. – saltycrane Jun 24 '11 at 16:24
  • I don't know what I was smoking last night, but I tried it again, and `sysconfig.get_python_lib(standard_lib=True)` works for me even when using virtualenv. – saltycrane Jun 24 '11 at 19:23
  • 3
    It turns out that if my virtualenv is activated and my current working directory is `/usr/lib/python2.6` when I invoke the Python interpreter, `sysconfig.get_python_lib(standard_lib=True)` returns the path for my virtualenv (e.g. `~/.virtualenvs/myenv/lib/python2.6`). However, if I my current working directory is something other than `/usr/lib/python2.6`, it returns the correct path, `/usr/lib/python2.6`. So I was not smoking drugs last night. – saltycrane Jun 25 '11 at 06:20
  • 4
    This is a pretty good solution, but it doesn't include core libraries such as `sys`. – Adam Spiers Jan 24 '12 at 19:00
  • 2
    This doesn't print modules like `datetime`, `itertools` or `math`, which are in the `lib-dynload/` directory and have a `.so` extension (on my machine anyway). Neither is `importlib` printed, which is in `importlib/__init__.py` – zahypeti Nov 23 '19 at 20:02
13

Take a look at this, https://docs.python.org/3/py-modindex.html They made an index page for the standard modules.

Edmund
  • 149
  • 1
  • 3
10

On Python 3.10 there is now sys.stdlib_module_names.

CCCC_David
  • 361
  • 3
  • 4
5

Here's an improvement on Caspar's answer, which is not cross-platform, and misses out top-level modules (e.g. email), dynamically loaded modules (e.g. array), and core built-in modules (e.g. sys):

import distutils.sysconfig as sysconfig
import os
import sys

std_lib = sysconfig.get_python_lib(standard_lib=True)

for top, dirs, files in os.walk(std_lib):
    for nm in files:
        prefix = top[len(std_lib)+1:]
        if prefix[:13] == 'site-packages':
            continue
        if nm == '__init__.py':
            print top[len(std_lib)+1:].replace(os.path.sep,'.')
        elif nm[-3:] == '.py':
            print os.path.join(prefix, nm)[:-3].replace(os.path.sep,'.')
        elif nm[-3:] == '.so' and top[-11:] == 'lib-dynload':
            print nm[0:-3]

for builtin in sys.builtin_module_names:
    print builtin

This is still not perfect because it will miss things like os.path which is defined from within os.py in a platform-dependent manner via code such as import posixpath as path, but it's probably as good as you'll get, bearing in mind that Python is a dynamic language and you can't ever really know which modules are defined until they're actually defined at runtime.

Adam Spiers
  • 17,397
  • 5
  • 46
  • 65
1

This will get you close:

import sys; import glob
glob.glob(sys.prefix + "/lib/python%d.%d" % (sys.version_info[0:2]) + "/*.py")

Another possibility for the ignore-dir option:

os.pathsep.join(sys.path)
Keith
  • 42,110
  • 11
  • 57
  • 76
  • 1
    I just realized that `sys.prefix` returns a path doesn't include most of the standard library modules when I'm running inside a virtualenv. I edited my question above. – saltycrane Jun 24 '11 at 06:17
1

Building on @Edmund's answer, this solution pulls the list from the official website:

def standard_libs(version=None, top_level_only=True):
    import re
    from urllib.request import urlopen
    if version is None:
        import sys
        version = sys.version_info
        version = f"{version.major}.{version.minor}"
    url = f"https://docs.python.org/{version}/py-modindex.html"
    with urlopen(url) as f:
        page = f.read()
    modules = set()
    for module in re.findall(r'#module-(.*?)[\'"]',
                             page.decode('ascii', 'replace')):
        if top_level_only:
            module = module.split(".")[0]
        modules.add(module)
    return modules

It returns a set. For example, here are the modules that were added between 3.5 and 3.10:

>>> standard_libs("3.10") - standard_libs("3.5")
{'contextvars', 'dataclasses', 'graphlib', 'secrets', 'zoneinfo'}

Since this is based on the official documentation, it doesn't include undocumented modules, such as:

  • Easter eggs, namely this and antigravity
  • Internal modules, such as genericpath, posixpath or ntpath, which are not supposed to be used directly (you should use os.path instead). Other internal modules: idlelib (which implements the IDLE editor), opcode, sre_constants, sre_compile, sre_parse, pyexpat, pydoc_data, nt.
  • All modules with a name starting with an underscore (which are also internal), except for __main__', '_thread', and '__future__ which are public and documented.

If you're concerned that the website may be down, you can just cache the list locally. For example, you can use the following function to create a small Python module containing all the module names:

def create_stdlib_module_names(
        module_name="stdlib_module_names",
        variable="stdlibs",
        version=None,
        top_level_only=True):
    stdlibs = standard_libs(
        version=version, top_level_only=top_level_only)
        with open(f"{module_name}.py", "w") as f:
            f.write(f"{variable} = {stdlibs!r}\n")

Here's how to use it:

>>> create_stdlib_module_names()  # run this just once
>>> from stdlib_module_names import stdlibs
>>> len(stdlibs)
207
>>> "collections" in stdlibs
True
>>> "numpy" in stdlibs
False
MiniQuark
  • 46,633
  • 36
  • 147
  • 183
1

This isn't perfect, but should get you pretty close if you can't run 3.10:

import os
import distutils.sysconfig

def get_stdlib_module_names():
    stdlib_dir = distutils.sysconfig.get_python_lib(standard_lib=True)
    return {f.replace(".py", "") for f in os.listdir(stdlib_dir)}

This misses some modules such as sys, math, time, and itertools.

My use case is logging which modules were imported during an app run, so having a rough filter for stdlib modules is fine. Also I return it as a set rather than a list so membership checks are faster.

xjcl
  • 12,848
  • 6
  • 67
  • 89
0

This should work for python >= 3.4 and can be easily imported

import distutils.sysconfig as sysconfig

python_version = float(sysconfig.get_python_version())

# pathlib was introduced in python 3.4
try:
    from pathlib import Path
except ImportError as err:
    raise ImportError("{}. Python >= 3.4 is required to use get_stdlib but you have {}".format(err, python_version))

def list_stdlib() -> frozenset:
    """
    Get a list of the Python standard library for the current Python version.
    
    Returns:
        list: List of standard library module names.
    """
    std_lib_path = Path(sysconfig.get_python_lib(standard_lib=True))
    std_lib_glob = std_lib_path.glob('*')
    std_lib = set()

    for mod in std_lib_glob:
        if mod.stem.startswith('_') or mod.stem == 'LICENSE': 
            continue
        if mod.suffix == '.py' and mod.parent == std_lib_path:
            std_lib.add(mod.stem)
        elif mod.is_dir() and mod.parent == std_lib_path:
            std_lib.add(mod.stem)

    return frozenset(std_lib)

def main():
    if python_version >= 3.10:
        return sysconfig.sys.stdlib_module_names
    return list_stdlib()
    
if __name__ == "__main__":
    print(list_stdlib())
gargolito
  • 1
  • 2
-1

This works on Anaconda on Windows, and I suspect it will work on Linux distros.

It goes to your Anaconda directory, e.g.: C:\Users\{user}\anaconda3\Lib, where standard libraries are installed. It then pulls folder names and filenames (dropping extensions).

import sys
import os

standard_libs = []
standard_lib_path = os.path.join(sys.prefix, "Lib")
for file in os.listdir(standard_lib_path):
    standard_libs.append(file.split(".py")[0].strip().lower())

NB: Builtins, viewable via print(dir(__builtins__)), are automatically loaded, whereas standard libs are not.

MinneapolisCoder9
  • 601
  • 1
  • 11
  • 29