0

I am trying to get all APIs from libraries (e.g. Scikit-Learn, Pandas) programmatically. I use importlib and inspect modules to do this.

import inspect
import importlib

def get_modules(lib_name):
    module_dict = {}
    for name, val in inspect.getmembers(lib_name, inspect.ismodule):
        if re.match(r'(?!^_+.+)', name): # filter non-public modules/methods
            if re.match(r'(?=.module \'sklearn\..+)', f'{val}'):
                # val has a common format of <module 'sklearn.svm...
                # to extract the module name remove the string from < to '
                # and everything after the module name
                module_dict[name] = f'{val}'[9:16+len(name)+1]

    return module_dict

lib_name = importlib.import_module('sklearn')
module_dict = get_modules(lib_name)

The output is as follows.

{'base': 'sklearn.base', 'exceptions': 'sklearn.exceptions', 'externals': 'sklearn.externals', 'utils': 'sklearn.utils'}

If include from sklearn import cluster (does not really have to be cluster, any legit submodule would do), I get the intended output.

{'base': 'sklearn.base', 'cluster': 'sklearn.cluster', 'decomposition': 'sklearn.decomposition', 'exceptions': 'sklearn.exceptions', 'externals': 'sklearn.externals', ...}

I would appreciate some help to figure this out. I prefer to use the importlib since I could simply list the libraries that I want to collect information instead of hard cording the import statement.

FYI, if I use import sklearn and replace lib_name in inspect.getmember(lib_name, ...) with sklearn, I get the correct results as well.

akalanka
  • 553
  • 7
  • 21

0 Answers0