C++ Eigen Library - Quadratic Programming, Fixed vs Dynamic, Performance

Question

I am looking into doing some quadratic programming, and have seen different libraries. I have seen various Eigen variants of QuadProg++ (KDE Forums, Benjamin Stephens, StackOverflow posts). Just as a test, I forked wingsit's Eigen variant, available on GitHub, to implement compile-time-sized problems to measure performance via templates.

I'm finding that I have worse performance in the templating case than the dynamically sized (MatrixXD / VectorXD) case from wingsit's code. I know this is not a simple question, but can anyone see a reason why this might be?

Note: I do need to increase the problem size / iteration count, will post that once I can.

EDIT: I am using GCC 4.6.3 on Ubuntu 12.04. These are the flags that I am using (modified from wingsit's code):

CFLAGS  = -O4 -Wall -msse2 -fopenmp      # option for obj
LFLAGS  = -O4 -Wall -msse2 -fopenmp      # option for exe (-lefence ...)

you should provide at least the compiler name and its version to make your post more interesting, the complete command and all the build steps would be something better to start with. — user1802174, Nov 15 '12 at 17:15

score 3 · Accepted Answer · answered Nov 15 '12 at 17:51

The benefits of static-sized code generally diminishes as the sizes grow bigger. The typical benefits of static-sized code mainly include (but not limited to) the following:

stack-based allocations which are faster than heap allocations. However, at large sizes, stack-based allocations are no longer feasible (stack-overflow), or even beneficial from a point of view of pre-fetching and locality of reference.
loop unrolling when the compiler sees a small static-sized loop, it can just unroll it, and possibly use SSE instructions. This doesn't apply at larger sizes.

In other words, for small sizes (up to maybe N=12 or so), static-sized code can be much better and faster than the equivalent dynamically-sized code, as long as the compiler is fairly aggressive about in-lining and loop unrolling. But, when sizes are larger, there is no point.

Additionally, there are a number of drawbacks about static-sized code too:

No efficient move-semantics / swapping / copy-on-write strategies since such code is usually implemented with static arrays (in order to have the benefits mentioned above) which cannot be simply swapped (as in, swapping the internal pointer).
Larger executables which contain functions that are spread out. Say you have one function template to multiply two matrices, and so, a new compiled function (inlined or not) is generated for each size combination. Then, you have some algorithm that does a lot of matrix multiplications, well, each multiplication will have to jump to (or execute inline) the specialization for that size combination. At the end, you end up with a lot more code that needs to be cached, and then, cache misses become a lot more frequent, and that will completely destroy the performance of your algorithm.

So, the lesson to draw from that is to use static-sized vectors and matrices only for small things like 2 to 6 dimensional vectors or matrices. But beyond that, it's preferable to go with dynamically-sized code (or maybe try static-sized code, but verify that it performs better, because it is not guaranteed to do so). So, I would advise you to reconsider your idea of using static-sized code for larger problems.

You aren't really talking about the size of the code, are you? — Ben Voigt, Nov 15 '12 at 18:44
@BenVoigt Except for the last bullet-point, anywhere that I talk about "size" I mean the number of elements of the vectors/matrices involved. As the OP is talking about a QP algorithm using Eigen (which has a number of vector/matrix class templates for compile-time sizes), it is that which I am referring to. — Mikael Persson, Nov 16 '12 at 07:38
I thought that's what you mean. Unfortunately, that's not what your answer *says*. You could probably improve the wording somewhat. — Ben Voigt, Nov 16 '12 at 15:38
Thank you for the comprehensive explanation, I feel like the issues you mentioned in the second paragraph might be the reasons why the code is slower. I had implemented a larger optimization problem with ~10 variables and ~20 constraints (lower and upper bounds), and the dynamic with a run time of ~350ms code out-performed by quite a bit the static code with a run time of ~1900ms, using the same compiler flags. I think in the future I will try Benjamin Stephen's version because he seems to leverage the features of Eigen for SSE. — eacousineau, Nov 16 '12 at 15:53

C++ Eigen Library - Quadratic Programming, Fixed vs Dynamic, Performance

1 Answers1