2

Suppose I have a Python 3 package structured like this:

.
└── MyFunPackage/
    ├── __init__.py
    ├── helloworld.py
    └── worlds/
        ├── __init__.py
        ├── world1.py
        └── world2.py

helloworld.py defines the following class:

class World(object):
    def __init__(self, name):
        self.name = name

Every module in the worlds sub-package defines different functions. For example, world1.py may contain:

def frobulate(self):
   return f'{self.name} has been frobulated' 

My end goal is to add every function in every module contained within the worlds sub-package to the World class at runtime, so that I don't need to manually change anything when I add another module to worlds/ (e.g. world3.py). However, I would also like to preserve the package hierarchy, so that a script outside the package could do the following:

from MyFunPackage.helloworld import World
aWorld = World('a')
print(aWorld.world1.frobulate()) # 'a has been frobulated'

Later, if I added a world3.py to the worlds sub-package, I should be able to add the following to the external script without making modifications to the World class:

print(aWorld.world3.wormhole(2)) # 'a has transited wormhole #2 to world3'

I think I've found some bits and pieces of what I need from these StackOverflow questions:

However, I'm having a lot of trouble fitting these pieces together, especially with the "preserving package hierarchy" bit. Is what I'm trying to accomplish here even possible? If it is, how would I go about implementing it?

nc4pk
  • 143
  • 2
  • 9
  • Is importing them (not at runtime) an Option? You can Import modules/classes in the `World` class – Uli Sotschok Aug 08 '19 at 14:03
  • Or can you not subclass the other classes? – WiseDev Aug 08 '19 at 14:03
  • Any solution is going to be worse than whatever problem you are trying to solve by preserving the package hierarchy. – chepner Aug 08 '19 at 14:11
  • I think it's possible (without being worse the whatever problem you're trying to solve). It's basically a "plug-in" architecture. Please add code to your question showing an example usage of the `MyFunPackage` — in other words, from a script outside of the file hierarchy shown. – martineau Aug 08 '19 at 15:09

2 Answers2

2

These kind of hierarchy definitions are a bit unusual in python projects, which is why you're having a hard time implementing it with the everyday-syntax. You should take a step back and think about how invested into this architecture you really are, and if it isn't too late to rewrite it in a way that adheres more closely to common python idioms, maybe you should do that instead ("explicit is better than implicit" in particular comes to mind).

That being said, if everyday python doesn't cut it, you can use strange python to write what you want without too much of a hassle. Consider reading up on the descriptor protocol if you want to understand how functions are turned into methods in detail.


MyFunPackage/worlds/__init__.py

from . import world1, world2

This line needs to be updated for any new world_n.py file you create. While it can be automated to import dynamically, it will break any IDE's member hinting and requires even more shifty code. You did write that you don't want to change anything else when adding modules, but adding the name of the file to this line is hopefully ok.

This file should not contain any other code.

MyFunPackage/worlds/world*.py

def frobulate(self):
    return f'{self.name} has been frobulated' 

There is no need to add any special code to world1.py, world2.py, or any of the new files in the worlds folder. Just write your functions in them as you see fit.

MyFunPackage/helloworlds.py

from types import MethodType, FunctionType, SimpleNamespace

from . import worlds

_BASE_ATTRIBUTES = {
    '__builtins__', '__cached__', '__doc__', '__file__',
    '__loader__', '__name__', '__package__', '__path__', '__spec__'
}


class Worlds:
    def __init__(self, name):
        self.name = name

        # for all modules in the "worlds" package
        for world_name in dir(worlds):
            if world_name in _BASE_ATTRIBUTES:
                continue  # skip non-packages and
            world = getattr(worlds, world_name)
            function_map = {}

            # collect all functions in them, by
            for func in dir(world):
                if not isinstance(getattr(world, func), FunctionType):
                    continue  # ignoring non-functions, and
                if getattr(world, func).__module__ != world.__name__:
                    continue  # ignoring names that were only imported

                # turn them into methods of the current worlds instance
                function_map[func] = MethodType(getattr(world, func), self)

            # and add them to a new namespace that is named after the module
            setattr(self, world_name, SimpleNamespace(**function_map))

The module addition logic is completely dynamic and does not need to be updated in any way when you add new files to worlds.


After setting it up as a package and installing it, trying your example code should work:

>>> from MyFunPackage.helloworld import Worlds
>>> x = Worlds('foo')
>>> x.world1.frobulate()
'foo has been frobulated'

Thanks python, for exposing your internal workings so deliberately.


Tangent: Dynamically adding functions to objects, patching vs describing

Using types.MethodType to turn a function into a method configures said descriptor protocol on it, and passes ownership of the function to the owning instance. This is preferable to patching the instance into the signature due to a number of reasons.

I'll give an example real quick, because I think this is good to know. I'll skip the namespace here, since it doesn't change the behavior and would just make it a little harder to read:

class Foo:
    """An example class that does nothing yet."""
    pass

def bar(self, text: str) -> str:
    """An example function, we will add this to an instance."""
    return f"I am {self} and say {text}."

import inspect
import timeit  
import types
# now the gang's all here!

Patching with a lambda

>>> foo = Foo()
>>> foo.bar = lambda *args, **kwargs: bar(foo, *args, **kwargs)
>>> foo.bar('baz')
'I am <__main__.Foo object at 0x000001FB890594E0> and say baz.'
# the behavior is as expected, but ...

>>> foo.bar.__doc__
None
# the doc string is gone
>>> foo.bar.__annotations__
{}
# the type annotations are gone
>>> inspect.signature(foo.bar)
<Signature (*args, **kwargs)>
# the parameters and their names are gone
>>> min(timeit.repeat(
...     "foo.bar('baz')",
...     "from __main__ import foo",
...     number=100000)
... )
0.1211023000000182
# this is how long a single call takes
>>> foo.bar
<function <lambda> at 0x000001FB890594E0>
# as far as it is concerned, it's just some lambda function

In short, while the base functionality is reproduced, a lot of information is lost along the way. There is a good chance that this will become a problem down the road, whether because you want to properly document your work, want to use your IDE's type hinting, or have to go through stack traces during debugging and want to know which function exactly caused problems.

While it's completely fine to do something like this to patch out a dependency in a test suite, it's not something you should do in the core of your codebase.

Changing the descriptor

>>> foo = Foo()
>>> foo.bar = types.MethodType(foo, bar)
>>> foo.bar('baz')
'I am <__main__.Foo object at 0x00000292AE287D68> and say baz.'
# same so far, but ...

>>> foo.bar.__doc__
'An example function, we will add this to an instance.'
# the doc string is still there
>>> foo.bar.__annotations__
{'text': <class 'str'>, 'return': <class 'str'>}
# same as type annotations
>>> inspect.signature(foo.bar)
<Signature (text: str) -> str>
# and the signature is correct, without us needing to do anything
>>> min(timeit.repeat(
...     "foo.bar('baz')",
...     "from __main__ import foo",
...     number=100000)
... )
0.08953189999999722
# execution time is 25% lower due to less overhead, no delegation necessary here
>>> foo.bar
<bound method bar of <__main__.Foo object at 0x00000292AE287D68>>
# and it knows that it's a method and belongs to an instance of Foo

Binding a function as a method in this way retains all information properly. As far as python is concerned, it is now the same as any other method that was bound statically and not dynamically.

Arne
  • 17,706
  • 5
  • 83
  • 99
1

So, this is probably not the problem Python was designed to solve, but we can make it work.

There are two separate parts to this dilemma: first, "how do I import all these packages without knowing them in advance?", and second "how do I bind those packages to a World object in a way that allows me to call method on them with self as the first parameter?" I'll tackle these in order.


How do I import all the packages in the directory?

__init__.py is the file that contains the code that runs whenever you try to load a module. Usually it's responsible for gathering all the important resources in a module and building a local namespace that others can use. We're going to slightly abuse this behavior:

worlds/__init__.py

import os, pkgutil

# import the names of all modules in this directory, save it to __all__
# this allows us to later do `from worlds import world1`, etc., if we want
# (though our helloworld doesn't actually do that)
__all__ = list(module for _, module, _ in pkgutil.iter_modules([os.path.dirname(__file__)]))

# make an attribute called `worlds` that is a dict between the name of each
# module in this folder, and the module itself.
worlds = {}
for _world_name in __all__:
    worlds[_world_name] = __import__(_world_name, locals(), globals(), level=1)

# You might want to do this as a dict comprehension, but that doesn't work.
# When I try to do so:
#
#      worlds2 = {_world_name:__import__(_world_name, locals(), globals(), level=1)
#                 for _world_name in __all__}
#
# I get the following error:
#
#   File ".../worlds/__init__.py", line 10, in <module>
#       for _world_name in __all__}
#   File ".../worlds/__init__.py", line 10, in <dictcomp>
#       for _world_name in __all__}
#   KeyError: "'__name__' not in globals"
#
# I have no idea why and a quick Google search turned up nothing.

This does two things. First, it allows us to selectively do the usual from worlds import world1, world2, ... if we want. That's what assigning to __all__ does. The method for finding all importable modules is taken straight from this answer.

However, this leaves __all__ as a list of strings, which isn't useful to helloworld, not really. Instead, I then create a dict worlds and make a direct correspondence between the name of each world and the module to which that name refers (by dynamically importing the modules via __import__()). So now we can also get to world1 by doing worlds.worlds['world1']. This is more useful to us.


How do I bind these packages/functions to World?

There are another two parts to this question: "how do I bind these packages", and "how do I get the function calls to still pass my World instance as a parameter". The first answer is straightforward: simply import worlds, then iterate through worlds.worlds.items() and use setattr() to assign key-value pairs as attributes.

But if we do this:

for module_name, module in worlds.worlds.items():
    setattr(self, module_name, module)

then we get the wrong behavior:

>>> x = helloworld.World('hello')
>>> x.world1.frobulate()
TypeError: frobulate() missing 1 required positional argument: 'self'

The solution to this issue is to put some sort of in-between wrapper in, which adds the instance of World() as the first argument whenever you try to call something on it. I do this by creating a new inner class, SubWorld, that on initialization effectively re-binds every method in the module.

Hence, this completed code:

helloworld.py

import worlds

# here's your generic World object
class World(object):
    def __init__(self, name):
        self.name = name
        # We take the dict that we created in worlds/__init__.py, and
        # iterate through it
        for world_name, module in worlds.worlds.items():
            # for each name/module pair, we assign that name as an attribute
            # to this object, paired to an object that holds all of its methods.
            # We could just pass the module itself as the third argument here,
            # but then `self` doesn't get passed as the first parameter. So,
            # we use an instance of a wrapper class which takes care of that.
            # See below.
            setattr(self, world_name, self.SubWorld(self, module))

    # Instead of importing the module wholesale, we make an inner class
    # and have that subclass essentially delegate functionality, by
    # essentially prepending the `self` parameter to the call.
    class SubWorld:
        def __init__(self, world, module):
            # scan all the attributes of the module
            for name in dir(module):
                obj = getattr(module, name)
                # if the object is a callable function, then add the World instance
                # as a `self`. We do this using a lambda.
                if callable(obj):
                    # We have the lambda take *args and **kwargs - that is,
                    # an arbitrary, catch-all list of args and kwargs to pass on.
                    # Then, we forward the function call with the same args and kwargs,
                    # except that we add `world` as a first argument (to take the place
                    # of `self`.
                    # We then set this lambda as an attribute with the same name as it
                    # had in the module we took the function from.
                    setattr(self, name, lambda *a,**k:obj(world,*a,**k))

This gives us the intended behavior:

>>> import helloworld
>>> x = helloworld.World('Tim')
>>> print(x.world1.frobulate())
'Tim has been frobulated'

Depending on how each worldn object is supposed to work, you can modify SubWorld accordingly (e.g. if references to variables need to be maintained, alongside references to functions). A good way to handle this dynamically might be to use property()s and specify the getter for any particular variable v as a lambda like lambda v:getattr(module, v).

Green Cloak Guy
  • 23,793
  • 4
  • 33
  • 53
  • 1
    Registering functions to an object like this with a lambda is really hacky, it mangles the signature to `(*args, **kwargs)`, doesn't carry over `__doc__` or `__annotations__`, and the function is not aware of it's new ownership. You should really just change the descriptor on the functions instead. – Arne Aug 13 '19 at 14:30