14

I would like to compute the eigenvalues of large-ish matrices (about 1000x1000) using Python 2.6.5. I have been unable to do so quickly. I have not found any other threads addressing this question.

When I run

a = rand(1000,1000);
tic;
for i =1:10
    eig(a);
end
toc;

in MATLAB it takes about 30 seconds. A similar test in Python requires 216 seconds. Running it through R using RPy did not speed up the computation appreciably. A test in Octave took 93 seconds. I am a bit baffled at the difference in speed.

The only instance of a question like this one I can find online is this, which is several years old. The poster in that question has a different Python directory structure (which I attribute to the age of the post, although I could be mistaken), so I have not been confident enough to attempt to follow the instructions posted by the correspondent.

My package manager says that I have LAPACK installed, and I am using NumPy and SciPy for the Python calculations:

from numpy import *
from scipy import *
from numpy.linalg import *
import time

a = randn(1000,1000)
tic = time.clock()
for i in range(0,10):
    eig(a)
toc = time.clock()
print "Elapsed time is ", toc-tic

I am pretty new to Python, so I may have done something silly. Please let me know if I need to provide any more information.

eat
  • 7,440
  • 1
  • 19
  • 27
Jamie D
  • 141
  • 1
  • 3
  • 1
    Are you using the same precision in Python and Matlab? – Dietrich Epp May 18 '11 at 22:19
  • 2
    You do need to make sure your indentation in your Python code is exactly correct. Your example can't run as written. – S.Lott May 18 '11 at 22:22
  • Just to let you know you should use `rand` for both (uniform distribution) or `randn` (normal distribution) but not mix them. This does not explain the difference though. – Wok May 18 '11 at 22:36
  • What would the timings be with `svd`? Also, please reformat your code properly, don't import something you are not using and please try to avoid idiom `from . import *`. Thanks – eat May 18 '11 at 22:43
  • I have tried `svd` (3 seconds *without the loop*) and `eigvals` (12 seconds), to be compared to 30 seconds with `eig`. Although, with `svd`, you only get the singular values of a'*a, not a. – Wok May 18 '11 at 22:45
  • I meant the square roots of the eigenvalues of a'*a. – Wok May 18 '11 at 22:53

2 Answers2

15

I think what you're seeing is the difference between the Intel Math Kernel Library (MKL) that's being used by Matlab and whatever LAPACK implementation you have on your system (ATLAS, maybe?) that scipy is linked against. You can see how much faster the MKL is in these benchmarks.

I imagine that you would get much better performance if you could rebuild Scipy against the Intel MKL libraries. If you're using Windows, pre-built copies can be downloaded from here, or you might consider using something like the Enthought Python Distribution.

Ray
  • 4,531
  • 1
  • 23
  • 32
  • 3
    That's certainly a large part of it. Comparing `np.linalg.eig` on a version of numpy using ATLAS vs one linked against the MKL results in a roughly 3-fold difference on my machine for an array of the OP's size. (10.1 sec vs 3.2 sec) – Joe Kington May 19 '11 at 00:02
3

I do get a difference in timings, but not as drastic as yours. My MATLAB (R2010b) timing was ~25 seconds and python (2.7) timing was ~60 seconds.

I'm not really surprised by these numbers as MATLAB is solely a numerical and matrix manipulation language, and it has the advantage of its JIT accelerator over python, which is a general purpose language. Generally, the differences between MATLAB and python+numpy are quite small, but become apparent when the matrix size is large, as in your case.

That doesn't mean there aren't ways to improve python's performance. The PerformancePython article on scipy's website gives a good introduction to the different ways in which you can improve the performance of python.

cel
  • 30,017
  • 18
  • 97
  • 117
abcd
  • 41,765
  • 7
  • 81
  • 98