1

According to this answer, it is quite simple to use scikit-learn parallel feaures on a cluster runned with ipycluster.

However, my example runs only on a starting node. I use anaconda distribution of ipython, skit-learn and ipyparallel, updated to the latest versions.

from sklearn.externals.joblib import Parallel, parallel_backend, register_parallel_backend
from ipyparallel import Client
from ipyparallel.joblib import IPythonParallelBackend
from sklearn.cluster import MeanShift

c = Client(profile='mpi')
print(c.ids)
bview = c.load_balanced_view()

register_parallel_backend('ipyparallel', lambda : IPythonParallelBackend(view=bview))

xyz=loaddata()
ms = MeanShift()
with parallel_backend('ipyparallel'):
    ms.fit(X)

I run a cluster with this comand:

ipcluster start --profile=mpi -n 17 --log-level DEBUG --delay 5

This example runs in parallel, but only on the node where it was started (htop says about full processors utilization). Ipycluster logs says that there is a client connection, but no task are submited to any nodes. Basic tests over this cluster shows that it operates normaly, and ipyparallel itself work as expected.

Any ideas, why this example does not distributes over all nodes?

0 Answers0