131

Is there a straightforward way to find all the modules that are part of a python package? I've found this old discussion, which is not really conclusive, but I'd love to have a definite answer before I roll out my own solution based on os.listdir().

CharlesB
  • 86,532
  • 28
  • 194
  • 218
static_rtti
  • 53,760
  • 47
  • 136
  • 192
  • Bonus question: how do you import the found modules nicely? – static_rtti Nov 10 '09 at 13:06
  • What's wrong with reading the source directory? What more information do you need? What's wrong with `ls` (or `dir`)? – S.Lott Nov 10 '09 at 13:38
  • 6
    @S.Lott: There are more general solutions available, python packages are not always in directories in the filesystem, but can also be inside zips. – u0b34a0f6ae Nov 10 '09 at 13:43
  • 4
    why reinvent the wheel? If python acquires hypermodules in Python 4, pkgutil and updated with that, my code will still work. I like to use abstractions that are available. Use the obvious method provided, it is tested and known to work. Reimplementing that.. now you have to find and work around every corner case yourself. – u0b34a0f6ae Nov 10 '09 at 15:48
  • @S.Lott: Ah, this is all about programmatic discovery of submodules. The code I posted comes from an application that loads plugins that are submodules to the plugins package -- there is no need to keep a manual index in the program, since pkgutil can list the plugins available. – u0b34a0f6ae Nov 10 '09 at 15:51
  • @kaizer.se. What manual index? I don't get the question. What's wrong with `ls`? When I want to know the modules in a package, I use `ls -r` on the filesystem. Or I unzip the egg and use `ls -r`. Why is that inadequate? What more is required? – S.Lott Nov 10 '09 at 16:05
  • 1
    @S.Lott: So everytime the application starts, it will unzip its own egg if installed inside one just to check this? Please submit a patch against my project to reinvent the wheel in this function: http://git.gnome.org/cgit/kupfer/tree/kupfer/plugins.py#n17. Please consider both eggs and normal directories, do not exceed 20 lines. – u0b34a0f6ae Nov 10 '09 at 17:24
  • @kaiser.se: How does "everytime the application starts, it will unzip its own egg" have anything to do with this question? Please clarify this question. Why is an `ls` not adequate? Please focus on why -- in this specific question -- the `ls` is not adequate. I only want clarification on the meaning of this question. – S.Lott Nov 10 '09 at 19:11
  • @S.Lott: Are you asking about a manual `ls` by me in the shell, or actual `os.popen("ls").read()` or do you really mean `os.listdir`? – u0b34a0f6ae Nov 10 '09 at 19:47
  • `ls` in the shell is not adequate. The program should discover itself whenever I or some other dev adds a new plugin by saving a new module (say new.py) inside the plugin subpackage. The program will display a list of discovered plugins. – u0b34a0f6ae Nov 10 '09 at 19:49
  • 1
    @S.Lott: Why you don't understand that it is relevant is something you can't understand. Discovering this programmatically is about that the **application** takes interest in the content of a package, not the user. – u0b34a0f6ae Nov 10 '09 at 19:52
  • Sorry, is something *I* can't understand. – u0b34a0f6ae Nov 10 '09 at 19:54
  • @static_rtti: Is it possible for you to explain what problem you are solving? Do you recognize the "detect submodules of a package at runtime" usecase? – u0b34a0f6ae Nov 10 '09 at 22:03
  • 3
    Of course I mean programmatically! Otherwise I wouldn't have mentioned "rolling out my own solution with os.listdir()" – static_rtti Nov 12 '09 at 19:52

5 Answers5

173

Yes, you want something based on pkgutil or similar -- this way you can treat all packages alike regardless if they are in eggs or zips or so (where os.listdir won't help).

import pkgutil

# this is the package we are inspecting -- for example 'email' from stdlib
import email

package = email
for importer, modname, ispkg in pkgutil.iter_modules(package.__path__):
    print "Found submodule %s (is a package: %s)" % (modname, ispkg)

How to import them too? You can just use __import__ as normal:

import pkgutil

# this is the package we are inspecting -- for example 'email' from stdlib
import email

package = email
prefix = package.__name__ + "."
for importer, modname, ispkg in pkgutil.iter_modules(package.__path__, prefix):
    print "Found submodule %s (is a package: %s)" % (modname, ispkg)
    module = __import__(modname, fromlist="dummy")
    print "Imported", module
u0b34a0f6ae
  • 48,117
  • 14
  • 92
  • 101
  • 11
    what is this `importer` returned by `pkgutil.iter_modules`? Can I use it to import a module instead of using this seemly "hackish" `__import__(modname, fromlist="dummy")` ? – MestreLion Nov 05 '13 at 22:41
  • 33
    I was able to use the importer like this: `m = importer.find_module(modname).load_module(modname)` and then `m` is the module, so for example: `m.myfunc()` – chrisleague Jun 07 '14 at 00:55
  • 1
    @chrisleague I was using ur method with python 2.7, but now I need to move on with python 3.4, so you know that in python 3 pkutil.iter_modules yields (module_finder, name, ispkg) instead of (module_loader, name, ispkg). What can I do to make it work like the previous one ? – crax Apr 12 '17 at 12:01
  • Your first example produces the following error: **"AttributeError: 'module' object has no attribute '__path__'"** Has this anything to do with Python version? (I use Python 2.7) – Apostolos Feb 23 '18 at 22:46
  • @Apostolos, you are using only one underscore on either side of path (ie `_path_`). There should be two on either side, for a total of four (ie `__path__`). – therealmitchconnors Apr 13 '18 at 18:36
  • No, I used two undercores, but stackoverflow eats one considering it as italics. – Apostolos Apr 17 '18 at 06:10
  • @Apostolos Here is the answer: [Python Glossary](https://docs.python.org/3/glossary.html#term-package) "Technically, a package is a Python module with an `__path__` attribute." - So we can't get the path from any module, only a package. – augustomen Oct 10 '19 at 14:03
  • Whatever. I can't remember what is all about after a year and a half. Thanks, anyway! – Apostolos Oct 12 '19 at 10:54
53

The right tool for this job is pkgutil.walk_packages.

To list all the modules on your system:

import pkgutil
for importer, modname, ispkg in pkgutil.walk_packages(path=None, onerror=lambda x: None):
    print(modname)

Be aware that walk_packages imports all subpackages, but not submodules.

If you wish to list all submodules of a certain package then you can use something like this:

import pkgutil
import scipy
package=scipy
for importer, modname, ispkg in pkgutil.walk_packages(path=package.__path__,
                                                      prefix=package.__name__+'.',
                                                      onerror=lambda x: None):
    print(modname)

iter_modules only lists the modules which are one-level deep. walk_packages gets all the submodules. In the case of scipy, for example, walk_packages returns

scipy.stats.stats

while iter_modules only returns

scipy.stats

The documentation on pkgutil (http://docs.python.org/library/pkgutil.html) does not list all the interesting functions defined in /usr/lib/python2.6/pkgutil.py.

Perhaps this means the functions are not part of the "public" interface and are subject to change.

However, at least as of Python 2.6 (and perhaps earlier versions?) pkgutil comes with a walk_packages method which recursively walks through all the modules available.

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • 6
    `walk_packages` is now in the documentation: http://docs.python.org/library/pkgutil.html#pkgutil.walk_packages – Mechanical snail Sep 01 '11 at 08:12
  • 1
    Your second example produces the following error: **"AttributeError: 'module' object has no attribute '__path__'"** - I didn't test it with 'scipy' but with a few other packages. Has this anything to do with Python version? ( I use Python 2.7) – Apostolos Feb 23 '18 at 21:51
  • 1
    @Apostolos: There should be two underscores (`_`) before and after `path` -- that is, [use `package.__path__`](https://stackoverflow.com/q/2699287/190597) rather than `package._path_`. It might be easier to try cutting & pasting the code rather than re-typing it. – unutbu Feb 24 '18 at 01:06
  • There were two of them, when I wrote the comment! :) But they have been stripped by the system. My bad; I should have put three undercores. But then, this would be OK if I wanted to use italics, which I didn't! ... It's a loss-loss situation. :) Anyway, when I run the code I used two of them, of course. (I copy-pasted the code.) – Apostolos Feb 26 '18 at 07:58
  • @Apostolos: Make sure the variable `package` is pointing to a package, not a module. Modules are files whereas packages are directories. [All packages have the `__path__` attribute](https://docs.python.org/2/tutorial/modules.html#packages-in-multiple-directories) (... unless someone deleted the attribute for some reason.) – unutbu Feb 26 '18 at 11:46
  • I see what you mean @unutbu ... Indeed, I tried with 'pip' and it works fine. Thanks! – Apostolos Feb 27 '18 at 09:18
3

This works for me:

import types

for key, obj in nltk.__dict__.iteritems():
    if type(obj) is types.ModuleType: 
        print key
Wolkenarchitekt
  • 20,170
  • 29
  • 111
  • 174
DarinP
  • 135
  • 1
  • 2
  • 1
    This fails in two ways 1. packages don't always explicitly import their submodules into the top-level namespace 2. packages may import other 3rd-party modules into their top-level namespace – wim Aug 27 '19 at 19:28
0

I was looking for a way to reload all submodules that I'm editing live in my package. It is a combination of the answers/comments above, so I've decided to post it here as an answer rather than a comment.

package=yourPackageName
import importlib
import pkgutil
for importer, modname, ispkg in pkgutil.walk_packages(path=package.__path__, prefix=package.__name__+'.', onerror=lambda x: None):
    try:
        modulesource = importlib.import_module(modname)
        reload(modulesource)
        print("reloaded: {}".format(modname))
    except Exception as e:
        print('Could not load {} {}'.format(modname, e))
user1767754
  • 23,311
  • 18
  • 141
  • 164
-4

Here's one way, off the top of my head:

>>> import os
>>> filter(lambda i: type(i) == type(os), [getattr(os, j) for j in dir(os)])
[<module 'UserDict' from '/usr/lib/python2.5/UserDict.pyc'>, <module 'copy_reg' from '/usr/lib/python2.5/copy_reg.pyc'>, <module 'errno' (built-in)>, <module 'posixpath' from '/usr/lib/python2.5/posixpath.pyc'>, <module 'sys' (built-in)>]

It could certainly be cleaned up and improved.

EDIT: Here's a slightly nicer version:

>>> [m[1] for m in filter(lambda a: type(a[1]) == type(os), os.__dict__.items())]
[<module 'copy_reg' from '/usr/lib/python2.5/copy_reg.pyc'>, <module 'UserDict' from '/usr/lib/python2.5/UserDict.pyc'>, <module 'posixpath' from '/usr/lib/python2.5/posixpath.pyc'>, <module 'errno' (built-in)>, <module 'sys' (built-in)>]
>>> [m[0] for m in filter(lambda a: type(a[1]) == type(os), os.__dict__.items())]
['_copy_reg', 'UserDict', 'path', 'errno', 'sys']

NOTE: This will also find modules that might not necessarily be located in a subdirectory of the package, if they're pulled in in its __init__.py file, so it depends on what you mean by "part of" a package.

Steve Losh
  • 19,642
  • 2
  • 51
  • 44
  • 1
    sorry, that has no use. Aside the false positives, it will only find already-imported submodules of packages too. – u0b34a0f6ae Nov 10 '09 at 13:09