Choose Python classes to instantiate at runtime based on either user input or on command line parameters

Question

I am starting a new Python project that is supposed to run both sequentially and in parallel. However, because the behavior is entirely different, running in parallel would require a completely different set of classes than those used when running sequentially. But there is so much overlap between the two codes that it makes sense to have a unified code and defer the parallel/sequential behavior to a certain group of classes.

Coming from a C++ world, I would let the user set a Parallel or Serial class in the main file and use that as a template parameter to instantiate other classes at runtime. In Python there is no compilation time so I'm looking for the most Pythonic way to accomplish this. Ideally, it would be great that the code determines whether the user is running sequentially or in parallel to select the classes automatically. So if the user runs mpirun -np 4 python __main__.py the code should behave entirely different than when the user calls just python __main__.py. Somehow it makes no sense to me to have if statements to determine the type of an object at runtime, there has to be a much more elegant way to do this. In short, I would like to avoid:

if isintance(a, Parallel):
    m = ParallelObject()
elif ifinstance(a, Serial):
    m = SerialObject()

I've been reading about this, and it seems I can use factories (which somewhat have this conditional statement buried in the implementation). Yet, using factories for this problem is not an option because I would have to create too many factories. In fact, it would be great if I can just "mimic" C++'s behavior here and somehow use Parallel/Serial classes to choose classes properly. Is this even possible in Python? If so, what's the most Pythonic way to do this?

Another idea would be to detect whether the user is running in parallel or sequentially and then load the appropriate module (either from a parallel or sequential folder) with the appropriate classes. For instance, I could have the user type in the main script:

 from myPackage.parallel import *

or

 from myPackage.serial import *

and then have the parallel or serial folders import all shared modules. This would allow me to keep all classes that differentiate parallel/serial behavior with the same names. This seems to be the best option so far, but I'm concerned about what would happen when I'm running py.test because some test files will load parallel modules and some other test files would load the serial modules. Would testing work with this setup?

bruno desthuilliers · Accepted Answer · 2019-08-12T15:08:51.083

You may want to check how a similar issue is solved in the stdlib: https://github.com/python/cpython/blob/master/Lib/os.py - it's not a 100% match to your own problem, nor the only possible solution FWIW, but you can safely assume this to be a rather "pythonic" solution.

wrt/ the "automagic" thing depending on execution context, if you decide to go for it, by all means make sure that 1/ both implementations can still be explicitely imported (like os.ntpath and os.posixpath) so they are truly unit-testable, and 2/ the user can still manually force the choice.

EDIT:

So if I understand it correctly, this file you points out imports modules depending on (...)

What it "depends on" is actually mostly irrelevant (in this case it's a builtin name because the target OS is known when the runtime is compiled, but this could be an environment variable, a command line argument, a value in a config file etc). The point was about both conditional import of modules with same API but different implementations while still providing direct explicit access to those modules.

So in a similar way, I could let the user type from myPackage.parallel import * and then in myPackage/init.py I could import all the required modules for the parallel calculation. Is this what you suggest?

Not exactly. I posted this as an example of conditional imports mostly, and eventually as a way to build a "bridge" module that can automagically select the appropriate implementation at runtime (on which basis it does so is up to you).

The point is that the end user should be able to either explicitely select an implementation (by explicitely importing the right submodule - serial or parallel and using it directly) OR - still explicitely - ask the system to select one or the other depending on the context.

So you'd have myPackage.serial and myPackage.parallel (just as they are now), and an additional myPackage.automagic that dynamically selects either serial or parallel. The "recommended" choice would then be to use the "automagic" module so the same code can be run either serial or parallel without the user having to care about it, but with still the ability to force using one or the other where it makes sense.

My fear is that py.test will have modules from parallel and serial while testing different files and create a mess

Why and how would this happen ? Remember that Python has no "process-global" namespace - "globals" are really "module-level" only - and that python's import is absolutely nothing like C/C++ includes.

import loads a module object (can be built directly from python source code, or from compiled C code, or even dynamically created - remember, at runtime a module is an object, instance of the module type) and binds this object (or attributes of this object) into the enclosing scope. Also, modules are garanteed (with a couple caveats, but those are to be considered as error cases) to be imported only once for a given process (and then cached) so importing the same module twice in a same process will yield the same object (IOW a module is a singleton).

All this means that given something like

# module A
def foo():
   return bar(42)

def bar(x):
   return x * 2

and

# module B

def foo():
   return bar(33)

def bar(x):
    return x / 2

It's garanteed that however you import from A and B, A.foo will ALWAYS call A.bar and NEVER call B.bar and B.foo will only ever call B.bar (unless you explicitely monkeyptach them of course but that's not the point).

Also, this means that within a module you cannot have access to the importing namespace (the module or function that's importing your module), so you cannot have a module depending on "global" names set by the importer.

To make a long story short, you really need to forget about C++ and learn how Python works, as those are wildly different languages with wildly different object models, execution models and idioms. A couple interesting reads are http://effbot.org/zone/import-confusion.htm and https://nedbatchelder.com/text/names.html

EDIT 2:

(about the 'automagic' module)

I would do that based on whether the user runs mpirun or just python. However, it seems it's not possible (see for instance this or this) in a portable way without a hack. Any ideas in that direction?

I've never ever had anything to do with mpi so I can't help with this - but if the general consensus is that there's no reliable portable way to detect this then obviously there's your answer.

This being said, simple stupid solutions are sometimes overlooked. In your case, explicitly setting an environment variable or passing a command-line switch to your main script would JustWork(tm), ie the user should for example use

SOMEFLAG=serial python main.py

vs

SOMEFLAG=parallel mpirun -np4 python main.py

or

python main.py serial

vs

mpirun -np4 python main.py parallel

(whichever works best for you needs - is the most easily portable).

This of course requires a bit more documentation and some more effort from the end-user but well...

So if I understand it correctly, this file you points out imports modules depending on an existing module in the system (checking `if 'posix' in _names:` where `_names = sys.builtin_module_names`). So in a similar way, I could let the user type `from myPackage.parallel import *` and then in `myPackage/__init__.py` I could import all the required modules for the parallel calculation. Is this what you suggest? If so, what about my question regarding testing? My fear is that `py.test` will have modules from parallel and serial while testing different files and create a mess. — aaragon, Aug 12 '19 at 12:24
@aaragon cf my edited answer. TL;DR: you have to forget about C++ and learn Python, as those are totally different languages. — bruno desthuilliers, Aug 12 '19 at 13:23
Thanks for the updated response! Regarding "an additional `myPackage.automagic` that dynamically selects either `serial` or `paralle`l" would be great. I would do that based on whether the user runs `mpirun` or just `python`. However, it seems it's not possible (see for instance [this](https://stackoverflow.com/questions/41899821/detecting-not-using-mpi-when-running-with-mpirun-mpiexec) or [this](https://stackoverflow.com/questions/12678337/how-can-my-program-detect-whether-it-was-launch-via-mpirun)) in a portable way without a hack. Any ideas in that direction? — aaragon, Aug 12 '19 at 14:49
Thanks, I was afraid your answer would take that direction. Thanks for all the input. — aaragon, Aug 12 '19 at 15:12
one last question. It seems that example you pointed me to works when there are two completely independent implementations. Would this work when these two implementations have some shared modules? I’m thinking a problem may arise when I load a shared module since I yet don’t know whether `SerialClass` or `ParallelClass` will be used. Or should I try to export the module as just `UsedClass` and then in the shared module import it as `__import__(‘UsedClass’)`? — aaragon, Aug 12 '19 at 15:49
I fail to see how having some shared module between both implementation would be an issue, nor why those modules should care about which implementation is calling them. Unless something in those modules needs to instanciate some class from one of the implementation modules, but even this is easily solved with dependencie injection.... — bruno desthuilliers, Aug 13 '19 at 07:15

score 1 · Answer 2 · answered Aug 12 '19 at 11:30

I'm not really what you're asking here. Python classes are just (callable/instantiable) objects themselves, so you can of course select and use them conditionally. If multiple classes within multiple modules are involved, you can also make the imports conditional.

if user_says_parallel:
    from myPackage.parallel import ParallelObject
    ObjectClass = ParallelObject
else:
    from myPackage.serial import SerialObject
    ObjectClass = SerialObject

my_abstract_object = ObjectClass()

If that's very useful depends on your classes and the effort it takes to make sure they have the same API so they're compatible when replacing each other. Maybe even inheritance à la ParallelObject => SerialObject is possible, or at least a common (virtual) base class to put all the shared code. But that's just the same as in C++.

Choose Python classes to instantiate at runtime based on either user input or on command line parameters

2 Answers2