Python 3: Monkey-patched code not re-importable by multiprocessing

Question

In Brief

How can I monkey patch module A from module B when module A's functions are supposed to be importable so that I can run the module A's functions with the multiprocessing standard library package?

Background

A client requested a hotfix that will not be applicable to any of our other clients, so I created a new branch and wrote a separate module just for them to make merging in changes from the master branch easy. To maintain the client's backward compatibility with pre-hotfix behavior, I implemented the hotfix as a configurable setting in the app. Thus I didn't want to replace my old code -- just patch it when the setting was turned on. I did this by monkey patching.

Code Structure

The __main__ module reads in the configuration file. If the configuration turns on the hotfix's switch, __main__ patches my engine module by replacing a couple of functions with code defined in the hotfix module -- in essence, the function being replaced is the key function to a maximization function. The engine module later loads up a pool of multiprocessing workers.

The Problem

Once a multiprocessing worker gets started, the first thing multiprocessing does it re-imports* the engine module and looks for the key function that __main__ had tried to replace (then multiprocessing hands over control to my code and the maximization algorithm begins). Since engine is being re-imported by a brand new process and the new process does not re-run __main__ (where the configuration file gets read) because that would cause an infinite loop, it doesn't know to re-monkey-patch engine.

The Question

How can I maintain modularity in my code (i.e., keeping the hotfix code in a separate module) and still take advantage of Python's multiprocessing package?

* Note my code has to work on Windows (for my client) and Unix (for my sanity...)

+1, nicely formatted question. But doesn't `multiprocessing` reimport `__main__` in child processes? — nneonneo, Sep 15 '12 at 07:39
@nneonneo, yes but rerunning `__main__.main()` would cause an infinite loop. — wkschwartz, Sep 17 '12 at 17:56
Sure, sure. But can't you have separate `read_config` and `main()` methods, and call `read_config` but not `main()`? — nneonneo, Sep 17 '12 at 18:23
How do I get `__main__` to figure out whether it is being loaded as a `multiprocessing` worker or from the command line? — wkschwartz, Sep 17 '12 at 18:40

score 1 · Answer 1 · answered Sep 15 '12 at 06:12

This sounds like a place where monkey-patching just won't work. It's easier to just extract the functions in question to a separate modules and have engine import them from there. Perhaps you can have a configuration setting of where to import them from.

Another way to modularize this is to use some sort of component architecture, like ZCA. That last option is what I would go with, but that's because I'm used to it, so there is no extra learning for me.

nneonneo · Answer 2 · 2012-09-17T19:01:23.103

To make it work on a UNIX/Linux OS which has fork(), you don't need to do anything special since the new process has access to the same (monkey-patched) classes as the parent.

To make it work on Windows, have your __main__ module read the configuration on import (put the read_config/patch_engine call at global scope), but do the multiprocessing (engine execution) in a if __name__ == '__main__' guard.

Then, the read-config code will be performed whenever __main__ is imported (either from the command-line or from a multiprocessing reimport), but the if __name__ == '__main__' code is performed only when your script is invoked from the command line (since __main__ is reimported under a different name in the child process).

score 0 · Answer 3 · answered Sep 14 '12 at 22:04

Sounds like you are going to have to modify engine.py to check a configuration file, and have it patch itself if it's needed.

To work on both unix and Windows engine can keep a global CONFIG_DONE variable to decide if it needs to check again for the configuration file.