I'm writing a code which has to compute large numbers of eigenvalue problems (typical matrices dimension is a few hundreds). I was wondering whether it is possible to speed up the process by using IPython.parallel
module. As a former MATLAB user and Python newbie I was looking for something similar to MATLAB's parfor
...
Following some tutorials online I wrote a simple code to check if it speeds the computation up at all and I found out that it doesn't and often actually slows it down(case dependent). I think, I might be missing a point in it and maybe scipy.linalg.eig
is implemented in such a way that it uses all the cores available and by trying to parallelise it i interrupt the engine management.
Here is the 'parralel' code:
import numpy as np
from scipy.linalg import eig
from IPython import parallel
#create the matrices
matrix_size = 300
matrices = {}
for i in range(100):
matrices[i] = np.random.rand(matrix_size, matrix_size)
rc = parallel.Client()
lview = rc.load_balanced_view()
results = {}
#compute the eigenvalues
for i in range(len(matrices)):
asyncresult = lview.apply(eig, matrices[i], right=False)
results[i] = asyncresult
for i, asyncresult in results.iteritems():
results[i] = asyncresult.get()
The non-parallelised variant:
#no parallel
for i in range(len(matrices)):
results[i] = eig(matrices[i], right=False)
The difference in CPU time for the two is very subtle. If on top of the eigenvalue problem the parallelised function has to do some more matrix operations it starts to last forever, i.e. at least 5 times longer than non-parallelised variant.
Am I right that eigenvalue problems are not really suited for this kind of parallelisation, or am I missing the whole point?
Many thanks!
EDITED 29 Jul 2013; 12:20 BST
Following moarningsun's suggestion i tried to run eig
while fixing the number of threads with mkl.set_num_threads
. For a 500-by-500 matrix minimum times of 50 repetitions set are the following:
No of. threads minimum time(timeit) CPU usage(Task Manager)
=================================================================
1 0.4513775764796151 12-13%
2 0.36869288559927327 25-27%
3 0.34014644287680085 38-41%
4 0.3380558903450037 49-53%
5 0.33508234276183657 49-53%
6 0.3379019065051807 49-53%
7 0.33858615048501406 49-53%
8 0.34488405094054997 49-53%
9 0.33380300334101776 49-53%
10 0.3288481198342197 49-53%
11 0.3512653110685733 49-53%
Apart from one thread case there is no substantial difference (maybe 50 samples is a bit to small...). I still think I'm missing the point and a lot could be done to improve the performance, however not really sure how. These were run on a 4 cores machine with hyperthreading enabled giving 4 virtual cores.
Thanks for any input!