3

Given a PyPI package name, like PyYAML, how can one programmatically determine the modules available within the package (distribution package) that could be imported?

Detail

I'm not specifically interested in PyYAML, it's just a good example of a popular PyPI package which has a different package name (PyYAML) from it's primary module name (yaml) such that you can't easily guess the module name from the package name.

I've seen other answers to questions that sound like this but are different, likely because of a naming collision

  • package meaning a python construct allowing for a collection of modules
  • package meaning a "Distribution Package", an archive file that contains Python packages, modules, and other resource files that are used to distribute a Release.

My question is about the relationship between distribution packages and the modules within.

Possible Solution Spaces

Areas that seem like they might be fruitful (but which I've not had success with yet) are :

  • The pydoc.help function (surfaced as the help built-in) outputs a complete list of all available modules when called as help('modules'). This shows modules that have not been imported but could be. It outputs in a human readable form to stdout, and I've been unable to figure out how the pydoc code enumerates the modules.
    • I could imagine calling this, gathering the module list, installing a new distribution package into a virtualenv with pip programatically, calling it again and diffing the results.
  • Progamatically installing a distribution package with pip in order to
    • Iterate through elements of the python path to find modules
gene_wood
  • 1,960
  • 4
  • 26
  • 39
  • See also [How to find "import name" of any package in Python?](https://stackoverflow.com/q/7184375/674039) – wim Oct 06 '22 at 20:32

1 Answers1

5

My project johnnydep provides exactly this feature:

$ johnnydep --fields=import_names PyYAML
name    import_names
------  --------------
PyYAML  yaml

Note that some distributions export multiple top-level names, some distributions export none at all, and there is not necessarily any obvious relationship between the distribution name (used with a pip install command) and the package name (used with an import statement) - though it is a common convention for them to be matched.

For example, the popular project setuptools exposes three top-level names:

$ johnnydep --fields=import_names setuptools 
name        import_names
----------  ---------------------------------------
setuptools  easy_install, pkg_resources, setuptools

API usage is via attribute access:

>>> from johnnydep.lib import JohnnyDist
>>> jdist = JohnnyDist("setuptools")
>>> jdist.import_names
['easy_install', 'pkg_resources', 'setuptools']

If you are interested to know submodule names, not top-level names, that's possible with stdlib pkgutil, for example:

>>> import pkgutil, requests
>>> [name for finder, name, ispkg in pkgutil.walk_packages(requests.__path__)]
['__version__',
 '_internal_utils',
 'adapters',
 'api',
 'auth',
 'certs',
 'compat',
 'cookies',
 'exceptions',
 'help',
 'hooks',
 'models',
 'packages',
 'sessions',
 'status_codes',
 'structures',
 'utils']
wim
  • 338,267
  • 99
  • 616
  • 750
  • Yup, that works! Looking at your code for `johnnydep` it looks like the answer is to just brute force the thing. Create a temp directory, [download the package](https://github.com/wimglenn/johnnydep/blob/cb10eb4f8472e2df74e3c53f50e422e103d820f5/johnnydep/lib.py#L78-L83), [unzip it](https://github.com/wimglenn/johnnydep/blob/cb10eb4f8472e2df74e3c53f50e422e103d820f5/johnnydep/lib.py#L91) and [inspect the contents](https://github.com/wimglenn/johnnydep/blob/cb10eb4f8472e2df74e3c53f50e422e103d820f5/johnnydep/lib.py#L116-L117). Gets the job done, excellent. Thanks! – gene_wood Aug 27 '19 at 20:26