1

In Python, it is easy to check if an object is a module with isinstance(obj, types.ModuleType). We can also programmatically generate modules. I am however interested in going the other way around - generating code that would have created an import resulting in the module being added the globals/locals namespace. Basically a function assemble_import like this:

def assemble_import(module: types.ModuleType, name: str) -> str:
    pass

Roughly satisfying following conditions:

import statistics
assert assemble_import(statistics, 'statistics') = 'import statistics'

from os import path
assert assemble_import(path, 'path') = 'from os import path'

import collections.abc
abc = collections.abc
assert assemble_import(abc, 'abc') = 'from collections import abc'

import abc as xyz
assert assemble_import(xyz, 'xyz') = 'import abc as xyz'

I would not want it to use the abstract syntax tree, but rather the module object itself. What I have tried to so far:

  • module.__spec__ - it returns ModuleSpec with the name attribute
  • module.__package__ - not sure why but is empty most of the time
  • module.__loader__ - usually SourceFileLoader, also has the name attribute

the problem with the name attribute is that it is 'posixpath' for os.path, while from os import posixpath clearly does not work. Also, I do not see how to get the parent module (os in the os.path example).

Is achieving a workable (though not necessarily a production-ready/bulletproof) solution possible in Python? In other words, is the information about packages structure needed to recreate the imports code preserved/accessible?

krassowski
  • 13,598
  • 4
  • 60
  • 92
  • 1
    Sounds like an XY problem. What is the actual goal you are trying to achieve by that? Are you aware of https://docs.python.org/3/library/importlib.html? – mkrieger1 Apr 05 '20 at 21:37
  • Thank you. Yes, I am aware. I do not want to manipulate my custom modules, but any user-provided module and I am curious if achieving code generation from the objects is possible with Python. – krassowski Apr 05 '20 at 21:46
  • I don't understand yet: In `assert assemble_import(xyz, 'xyz') = 'import abc as xyz'`, where should `abc` come from in the output? – mkrieger1 Apr 05 '20 at 21:49
  • From the module object. `xyz` is aliased `abc` (`import abc as xyz`). – krassowski Apr 05 '20 at 21:51
  • The second argument is provided from globals/locals dict (and is the name of the variable, which generally cannot be obtained from the object itself). – krassowski Apr 05 '20 at 21:52
  • An example use of this would be in a re-implementation of https://github.com/microsoft/gather (or any other program which does dynamic code generation from the user input). The current implementation stores every line ever executed and attempts to re-create a workable subprogram with trial-and-error subsetting (well, not really, but this is the idea). The code generated this way can be suboptimal (see `import os; sys = os.sys`), and this implementation is leaking memory in a sense (storing every line ever executed). Static analysis and AST would be a way out, but this question is not about it. – krassowski Apr 05 '20 at 22:02
  • You're not going to get what you want for `os.path` unless you add a special case, because `os.path` is weird - it's really the `ntpath` or `posixpath` module, depending on your platform. `os.path` is just a platform-dependent alias. – user2357112 Apr 06 '20 at 08:22
  • Your desired output for `os.sys` is just bizarre. `os.sys` is not an expression you should ever write - `os` happens to import the `sys` module, and that does make `sys` an accessible attribute on the `os` module, but this is an implementation detail you should never rely on, and there's no point in relying on it. Just `import sys` through its actual name. – user2357112 Apr 06 '20 at 08:25
  • Agreed, but these are just examples that I used to demonstrate the idea, could have used os.path or something else again. – krassowski Apr 06 '20 at 08:28
  • 1
    `os.path` is unique and idiosyncratic in a way that modules are normally not supposed to be - it probably would not have been designed that way if it wasn't such an early part of Python. If you really want to treat it as `os.path` instead of `posixpath` or `ntpath`, your best bet is a special case. – user2357112 Apr 06 '20 at 08:32
  • Is `collections.abc` different from `os.path`? This was introduced in 3.3. – krassowski Apr 06 '20 at 08:39
  • `collections.abc` is quite different from `os.path`. It's an actual submodule of an actual `collections` package, and it's the same module on all platforms, rather than being a platform-dependent alias. – user2357112 Apr 06 '20 at 08:40
  • I see. `abc.__package__ == 'collections'` while `path.__package__ == ''`. The observation that `os.path` (which I was using as a toy example) is just weird and could be treated as a special case mostly solves this question. Do you have a reference to support that `os.path` is unique in its weirdness? – krassowski Apr 06 '20 at 08:44
  • Not really. I can point to the [tutorial section](https://docs.python.org/3/tutorial/modules.html#packages) on packages, and I can point to how [the `os` module and its way of setting up `os.path` looks nothing like that](https://github.com/python/cpython/blob/v3.8.2/Lib/os.py#L48-L93), and you can go look at a bunch of other packages that look nothing like what `os` does. – user2357112 Apr 06 '20 at 09:01
  • See https://stackoverflow.com/questions/2724348/should-i-use-import-os-path-or-import-os – mkrieger1 Apr 06 '20 at 09:05

1 Answers1

1

You can, and it's quite straightforward, but os.path will be a little weird:

def assemble_import(module, name):
    return 'import {} as {}'.format(module.__name__, name)

os.path is weird because it's a platform-dependent alias for either the ntpath module or the posixpath module. This function will use the module's actual name, either ntpath or posixpath. If you want to treat os.path as os.path, you can special-case it, though it might not be the best design choice.

For actual package submodules, like collections.abc, this function will treat them as submodules of their containing package:

>>> assemble_import(collections.abc, 'abc')
'import collections.abc as abc'

but for os.path, this will give you output like

>>> assemble_import(os.path, 'path')
'import posixpath as path'

If you want imports that look a bit more like what a human would normally write, you can add some logic to the function:

def assemble_import(module, name):
    pname, _, mname = module.__name__.rpartition('.')
    if pname:
       statement = 'from {} import {}'.format(pname, mname)
    else:
       statement = 'import ' + mname
    if mname != name:
       statement += ' as ' + name
    return statement
user2357112
  • 260,549
  • 28
  • 431
  • 505