4

I need to make a matrix/vector multiplication in Matlab of very large sizes: "A" is an 655360 by 5 real-valued matrix that are not necessarily sparse and "B" is a 655360 by 1 real-valued vector. My question is how to compute: B'*A efficiently.

I have notice a slight time improvement by computing A'*B instead, which gives a column vector. But still it is quite slow (I need to perform this operation several times in the program).

With a little bit search I found an interesting Matlab toolbox MTIMESX by James Tursa, which I hoped would improve the above matrix multiplication performance. After several trials, I can only have very marginal gains over the Matlab native matrix multiplication.

Any suggestions about how should I rewrite A'*B so that the operation is more efficient? Thanks.

Amro
  • 123,847
  • 25
  • 243
  • 454
Cowboy
  • 41
  • 1
  • 2
  • 1
    I think for matrix operations, Matlab performance is already close to the best you can have, since matrix ops are already optimized and parallelized . – jpjacobs Oct 04 '11 at 08:50
  • As many here mentioned Matlab should have no problem handling such matrix multiplication. However your question suggest there is something very very wrong with your code or your system: Multiplying vectors of this size on my i7 machine takes around 0.003 seconds. Even if we assume older machines are 300 times slower, the computation should take under a second! There isn't suppose to be a memory issue as well since matrix "A" requires only 26 MB of memory. – Yanir Kleiman Jan 03 '13 at 13:53

5 Answers5

10

Matlab's raison d'etre is doing matrix computations. I would be fairly surprised if you could significantly outperform its built-in matrix multiplication with hand-crafted tools. First of all, you should make sure your multiplication can actually be performed significantly faster. You could do this by implementing a similar multiplication in C++ with Eigen.

thiton
  • 35,651
  • 4
  • 70
  • 100
3

I have had good results with matlab matrix multiplication using the GPU

Maurits
  • 2,082
  • 3
  • 28
  • 32
  • 2
    the parallel computational power seems have been integrated in the new Matlab release with some function like "gpuArray". – Cowboy Oct 04 '11 at 16:28
1

In order to avoid the transpose operation, you could try:

sum(bsxfun(@times, A, B), 2)

But I would be astonished it was faster than the direct version. See @thiton's answer.

Also look at http://www.mathworks.co.uk/company/newsletters/news_notes/june07/patterns.html to see why the column-vector-based version is faster than the row-vector-based version.

Nzbuu
  • 5,241
  • 1
  • 29
  • 51
  • Thanks. Indeed it is very difficult to beat the native Matlab matrix multiplication. It requires more time if I use the bsxffun together with sum – Cowboy Oct 04 '11 at 09:54
1

Matlab is built using fairly optimized libraries (BLAS, etc.), so you can't easily improve upon it from within Matlab. Where you can improve is to get a better BLAS, such as one optimized for your processor - this will enable better use of the caches by getting appropriately sized blocks of data from main memory. Take a look into creating your own compiled versions of ATLAS, ACML, MKL, and Goto BLAS.

I wouldn't try to solve this one particular multiplication unless it's really killing you. Changing up the BLAS is likely to lead to a happier solution, especially if you're not currently making use of multicore processors.

Iterator
  • 20,250
  • 12
  • 75
  • 111
  • Can you elaborate on this a bit? How do I get a better BLAS and then tell Matlab to use it? If a better BLAS is available, why doesn't Matlab use it already? – littleO Jan 13 '16 at 19:03
0

Your #1 option, if this is your bottleneck, is to re-examine your algorithm. See this question Optimizing MATLAB code for a great example of how choosing a different algorithm reduced runtime by three orders of magnitude.

Community
  • 1
  • 1
Marc
  • 3,259
  • 4
  • 30
  • 41