How to add two very large matrices efficiently?

Question

I have two very large matrices. The problem is that adding them using for loops is taking time. I came to know that we can add matrices using operator overloading in C++. Will doing so reduce the execution time?

It is too bad this isn't "StatementOverflow" Could you please reword your statement into a question? http://stackoverflow.com/questions/how-to-ask — CoderDake, Oct 23 '13 at 13:37
Adding them using bitwise operators or lookup tables or quantum computers takes time, too. First and foremost: Ask yourself about the benefit of whatever you do; if it gains you 1% better execution time, but 30% less readable code, don't. Look at other hotspots first. If it _costs_ you 1% execution time, but yields 20% better readable code, think about it. Always ponder benefit vs. cost, and don't guess, but use facts, or just try it and measure (trying costs you a bit, though). /philosophical — Sebastian Mach, Oct 23 '13 at 13:38
@phresnel: you have not done much performance critical linear algebra like 3D rendering, have you? These can be pretty expensive operations... It is generally speaking, an up-front design principle in certain projects. It does not need measurement time spent to get proof for the obvious. — László Papp, Oct 23 '13 at 13:51
@LaszloPapp: Oh, it happens that I have: [A](http://phresnel.org/archive/index.php?categoryid=14&PHPSESSID=96sudqcq3qsrv89luh7lv94i53), [B](http://phresnel.org/iotd.php), [C](http://picogen.org/), [D](https://code.google.com/p/tinscape/). However, I think you missed the point of my post. You write `It is generally speaking, an up-front design principle in certain projects`: But _what_ is that up-front design principle you claim I miss? Proof for _what_ obvious? Or am I just the wrong addressee of your comment? — Sebastian Mach, Oct 23 '13 at 14:09
@phresnel: Well, the context is about performance optimization... There are guidelines for certain projects, especially 3D what to do and what not. One does not need to get proof for the same performance problem for the 1000th time. That would be a so total time waste. ;-) — László Papp, Oct 23 '13 at 14:15
@LaszloPapp: Let me explain what my comment was about: It was said that adding matrices takes time. Of course this is true, because there is no known algorithm that does take no time; therefore, that remark is superfluous. Then I gave a bit of insight about when to optimize in the general case, and when not. Of course this does not answer the question whether operator overloading yields faster code than plain functions, hence it was just a comment. — Sebastian Mach, Oct 23 '13 at 15:57
And about the guidelines: They are very different depending on the project: Do you only need 4x4 matrices, or arbitrary ones? Do you need 3-vectors? No expression templates needed. Or arbitrary and large n-vectors? Look into expression templates (hint: GCC comes with expression template valarray). Does your problem fit a streaming pipeline? Look into SIMD, OpenCL. Is your problem largely incoherent, in execution AND data? Look into OpenMP or Quasi Monte Carlo methods. You have tight inner loops? Squeeze the bytes and increase predictability. There is no general answer to the question. — Sebastian Mach, Oct 23 '13 at 16:01
@LaszloPapp: The thing is just, it depends. I just [recently ditched](https://github.com/phresnel/gaudy/commit/6c1271340f5861dfad7d6fd510f7eec0d2795e4e) passing RGB-triplets as references in my library, because after measurement, there was no performance difference at all on the systems I want to support. But it gained me and users a lot of readability. And I've seen different systems just enough to see that there is no magical performance recipe. Even within the ray tracing community, there is large disagreement between real time ray tracing (coherent) and offline ray tracing (incoherent). — Sebastian Mach, Oct 23 '13 at 16:42
@LaszloPapp: But as you want, I can only agree on disagreement. — Sebastian Mach, Oct 23 '13 at 16:43
@phresnel: I am not referring to a tiny RGB class, but the point in the question: a matrix. Anyway, our conversation makes the poor thread a bit unwieldy. :-) Feel free to drop me an email if you wanna talk about interesting C++ graphics stuff, or I am even on IRC. — László Papp, Oct 23 '13 at 17:30

score 5 · Answer 1 · edited May 23 '17 at 10:25

5

Moving the loops into an overloaded operator will make no difference.

One way to improve performance is by using a specialized library for this, such as BLAS. A quality BLAS implementation (for example, Intel's MKL) will be much faster than anything you are likely to hand-code.

For some pointers regarding C++ wrappers for BLAS, see LAPACK wrappers for C/C++ (the question is about Windows, but the answers are broader than this).

edited May 23 '17 at 10:25

Community

1
1

answered Oct 23 '13 at 13:37

NPE

486,780
108
951
1,012

Do you know how that performs compared to e.g. eigen? – László Papp Oct 23 '13 at 13:54
@LaszloPapp: No experience with eigen, but it looks like something the OP should check out. – NPE Oct 23 '13 at 14:01

score 1 · Answer 2 · answered Oct 23 '13 at 13:37

1

Operator is treated just as every other function in C++, so simply changing your adding function to an operator without changing the logic won't help.

You'd probably need to make use of some sort of SIMD calculations.

answered Oct 23 '13 at 13:37

Bartek Banachewicz

38,596
7
91
135

score 0 · Answer 3 · answered Oct 23 '13 at 13:41

One way to do this is use the vector operators available at the x86 extension. Check this for an example http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions#Example

If you use gnu or visual studio, they might have builtin intrinsics that you can call as a function instead of coding in assembly.

score 0 · Answer 4 · answered Oct 23 '13 at 13:45

0

Put your matrices to valarray or use specialized library for that, eigen for instance. Blas is ugly and if you do not have access to commercial implementations not particularly well performing any more.

answered Oct 23 '13 at 13:45

Slava

1,528
1
15
23

How to add two very large matrices efficiently?

4 Answers4