I am trying to get all APIs from libraries (e.g. Scikit-Learn, Pandas) programmatically. I use importlib
and inspect
modules to do this.
import inspect
import importlib
def get_modules(lib_name):
module_dict = {}
for name, val in inspect.getmembers(lib_name, inspect.ismodule):
if re.match(r'(?!^_+.+)', name): # filter non-public modules/methods
if re.match(r'(?=.module \'sklearn\..+)', f'{val}'):
# val has a common format of <module 'sklearn.svm...
# to extract the module name remove the string from < to '
# and everything after the module name
module_dict[name] = f'{val}'[9:16+len(name)+1]
return module_dict
lib_name = importlib.import_module('sklearn')
module_dict = get_modules(lib_name)
The output is as follows.
{'base': 'sklearn.base', 'exceptions': 'sklearn.exceptions', 'externals': 'sklearn.externals', 'utils': 'sklearn.utils'}
If include from sklearn import cluster
(does not really have to be cluster
, any legit submodule would do), I get the intended output.
{'base': 'sklearn.base', 'cluster': 'sklearn.cluster', 'decomposition': 'sklearn.decomposition', 'exceptions': 'sklearn.exceptions', 'externals': 'sklearn.externals', ...}
I would appreciate some help to figure this out. I prefer to use the importlib since I could simply list the libraries that I want to collect information instead of hard cording the import statement.
FYI, if I use import sklearn
and replace lib_name
in inspect.getmember(lib_name, ...)
with sklearn
, I get the correct results as well.