2

As far as I understand, an ipython cluster manages a set of persistent namespaces (one per engine). As a result, if a module that is imported by an engine engine_i is modified, killing the main interpreter is not sufficient for that change to be reflected in the namespace of engine_i.

Here's a toy example that illustrates this:

#main.py
from ipyparallel import Client
from TC import test_class #TC is defined in the next code block

if __name__=="__main__":

    cl=Client()
    cl[:].execute("import TC")
    lv=cl.load_balanced_view()
    lv.block=True
    tc=test_class()
    res=lv.map(tc, [12,45])
    print(res)

with the TC module only consisting of

#TC.py
class test_class:
    def __call__(self,y):
        return -1

Here, consider the excution

$npcluster start -n <any_number_of_engines>  --daemonize
$python3 main.py
[-1, -1]
$#open some editor and modify test_class.__call__ so that it returns -2 instead of -1
$python3 main.py #output is as expected, still [-1, -1] instead of [-2, -2]
[-1, -1]

This is expected as the engines have their own persistent namespaces, and a trivial solution to make sure that changes to TC are included in the engines is simply to kill (e.g. via $ipcluster stop) and restart them again before running the script.

However, killing/restarting engines quickly becomes tedious in case you need to frequently modify a module. So, far, I've found a few potential solutions but none of them are really useful:

  1. If the modification is made to a module directly imported to the engine's namespace, like TC above:

    cl[:].execute("from imp import reload; import TC; reload(TC)")
    

    However, this is very limited as it is not recursive (e.g. if TC.test_class.__call__ itself imports another_module and we modify another_module, then this solution wont work).

  2. Because of the problem with the previous solution, I tried ipython's deepreload in combination with %autoreload:

    from IPython import get_ipython
    ipython=get_ipython()
    ipython.magic("%reload_ext autoreload")
    ipython.magic("%autoreload 2")
    cl[:].execute("import builtins;from IPython.lib import  deepreload;builtins.reload=deepreload.reload;import TC;reload(TC)")
    

    This doesn't seem to work at all for reasons that so far I haven't understood.

  3. The magic %reset from ipython is supposed to (per the documentation)) clear the namespace, but it didn't work on the engine namespaces including in the toy example given above.

  4. I tried to adapt the first answer given here to clean up the engine namespaces. It doesn't seem however to help with re-importing modified modules.

It seems to me that the most reliable solution is therefore to just kill/restart the engines each time. It looks like this can't even be done from the script as cl.shutdown(restart=True) throws NotImplementedError. Is everyone working with ipyparallel constanty restarting their clusters manually or is there something obvious that I'm missing?

Ash
  • 4,611
  • 6
  • 27
  • 41

1 Answers1

1

To clear the namespaces of the engines, ipyparallel's Client objects (as well as DirectView and BroadcastView objects) have a clear() method (documentation) that does exactly that. For instance:

>>> from ipyparallel import Client
>>> client = Client()
>>> dview = client[:]
>>> dview.block = True
>>> dview.execute('import TC')
<AsyncResult: execute:finished>
>>> dview.apply(dir)
[['In', 'Out', 'TC', '_6f3c4b7b7576b8f6a12531042d4da9e4_5_args', '_6f3c4b7b7576b8f6a12531042d4da9e4_5_f', '_6f3c4b7b7576
b8f6a12531042d4da9e4_5_kwargs', '_6f3c4b7b7576b8f6a12531042d4da9e4_5_result', '__builtin__', '__builtins__', 
...
>>> client.clear(client.ids)
<Future at 0x2576553edf0 state=pending>
# The TC module is gone. What remains are built-in symbols, as well as some variables created when using apply()
>>> dview.apply(dir)
[['In', 'Out', '_6f3c4b7b7576b8f6a12531042d4da9e4_13_args', '_6f3c4b7b7576b8f6a12531042d4da9e4_13_f', '_6f3c4b7b7576b8f6
a12531042d4da9e4_13_kwargs', '_6f3c4b7b7576b8f6a12531042d4da9e4_13_result', '__builtin__', '__builtins__', '__name__', '
...

However, this function doesn't help with reloading a module on an engine, which is more of what you're really attempting to do, because Python caches loaded modules. There doesn't seem to be one way to reload modules that always works; in addition to the question you linked, this question, this question and this question give some solutions for different situations.