11

I have tried those two alternatives

objective = lambda A, x : (np.dot(x.T ,np.dot(A, x)))[0,0]
objective = lambda A, x : (np.matrix(x).T * np.matrix(A) * np.matrix(x))[0,0]

With primary one I got 5 sec of running time with my algorithm With secondary I got 14 sec

With MATLAB I got 2 sec

I want to go with the Numpy but obviously I need a way to ameliorate this crummy results. How can I get faster quadratic form matrix, vector product?

Note: I profiled the code and this lambda function drinks the juice of all. Improvemnt: I merely remove the native Ubuntu package of scipy and numpy then installed with followings

sudo pip install numpy
sudo apt-get install libatlas-base-dev gfortran
sudo pip install scipy
sudo apt-get install libpng-dev libfreetype6-dev
sudo pip install matplotlib 

I improves the performance a bit, however still lower than Matlab

erogol
  • 13,156
  • 33
  • 101
  • 155
  • what is the `[0, 0]` at the end? – Bi Rico Jan 30 '14 at 00:14
  • I need a single scalar value but it returns array with single value – erogol Jan 30 '14 at 00:15
  • is it still as slow if you actually define the functions? lambdas are a little slower I *think* – Joran Beasley Jan 30 '14 at 00:22
  • 1
    How big are your matrices? If they are big, the speed bottleneck is not in Numpy. Obtain a copy of Numpy linked against Intel MKL or some other high-performance linear algebra library. – pv. Jan 30 '14 at 00:45
  • As someone interested in speeding up python, you might find [this blog about cython](http://nbviewer.ipython.org/github/carljv/cython_testing/blob/master/cython_linalg.ipynb) an interesting read. Sadly in this case, `cython` doesn't improve on `np.dot` – yardsale8 Jan 30 '14 at 01:30
  • Please include all the code (including the setup of A and x, and the timing code). – David Ketcheson Jan 30 '14 at 05:56
  • 1
    @Erogol: Matlab uses Intel MKL, which is better optimized than ATLAS. Instead of ATLAS, you may want to try Openblas, which is also free an may perform better. – pv. Jan 30 '14 at 14:08
  • @pv how can I set openblas instead of atlas – erogol Jan 30 '14 at 14:44
  • @pv I also found this. Do you have any comment http://queforum.com/python/418459-python-build-numpy-eigen-instead-atlas-openblas.html – erogol Jan 30 '14 at 14:59

2 Answers2

7

I have both NumPy and Matlab installed and they both take around 45 ms for a 10000x10000 matrix.

Considering your timings, I suspect that x is not a single column vector. If you want to do this computation for multiple column vectors all at once, look at my answer to this question: Calculate "v^T A v" for a matrix of vectors v . The timings you are listing are terribly slow if x is just a single column vector (in NumPy or Matlab).

I suspect, however, that the difference could also be coming from how your NumPy installation was compiled. This is really a timing of the BLAS functions used by NumPy and Matlab. I believe both are really calling the same underlying library on my machine since I have NumPy linked against Intel's MKL. If NumPy is built against a well-optimized BLAS like the Intel MKL, large vector operations like this should run at roughly the same speed as Matlab since they are both likely to be calling the same lower level BLAS functions. If your version of NumPy is not compiled using an optimized BLAS, performance will be worse.

If you know your installation of NumPy is already linked against the MKL, you can try setting the MKL_NUM_THREADS environment variable to match the number of processors on your system.

One simple way to get the properly compiled version of NumPy is to use a pre-built distribution. Anaconda and Enthought are very good but they will require a subscription to get the optimized versions. Academic licenses are available for free. You can also look here: http://www.lfd.uci.edu/~gohlke/pythonlibs/

Community
  • 1
  • 1
IanH
  • 10,250
  • 1
  • 28
  • 32
  • 1
    +1 as I get similar results. 10000x10000 matrices take about 400 ms on a 4 year old cheap laptop using Anaconda academic licenses. – yardsale8 Jan 30 '14 at 01:28
  • 1
    I suspect the report of "5 sec of running time with my algorithm" means the algorithm computes the quadratic form many times. – Warren Weckesser Jan 30 '14 at 03:53
  • Good point. It is rather open-ended. My suspicion was that they were timing the multiplication with both x and A as 1500x1500 arrays, but that could be wrong. – IanH Jan 30 '14 at 04:32
  • How Can I check my installations is bounded to correct set of back-end libraries – erogol Jan 30 '14 at 10:52
  • Okay, In case anyone else still has this question, you can check if you are using MKL using `import numpy` and then `numpy.show_config()` If you are using MKL, the different "libraries" options should have several items mentioning MKL. – IanH Jan 30 '14 at 18:51
3

Finally what I have done is to change the bounded library of numpy for linear algebra functions. It was using ATLAS for default but I push harder ( like 4 hours ) to change this to OpenBlas. I found that guide Compiling numpy with OpenBLAS integration and followed bit by bit. Result is working with faster time. It is still lacking in relation to Matlab (Intel MLK) 2.5 sec but tolerable with 3 sec execution.

Community
  • 1
  • 1
erogol
  • 13,156
  • 33
  • 101
  • 155