gsl_complex vs. std::complex performance

Question

I'm writing a program that depends a lot on complex additions and multiplications. I wanted to know whether I should use gsl_complex or std::complex.

I don't seem to find a comparison online of how much better GSL complex arithmetic is as compared to std::complex. A rudimentary google search didn't help me find a benchmarks page for GSL complex either.

I wrote a 20-line program that generates two random arrays of complex numbers (1e7 of them) and then checked how long addition and multiplication took using clock() from <ctime>. Using this method (without compiler optimisation) I got to know that gsl_complex_add and gsl_complex_mul are almost twice as fast as std::complex<double>'s + and * respectively. But I've never done this sort of thing before, so is this even the way you check which is faster?

Any links or suggestions would be helpful. Thanks!

EDIT:

Okay, so I tried again with a -O3 flag, and now the results are extremely different! std::complex<float>::operator+ is more than twice as fast as gsl_complex_add, while gsl_complex_mul is about 1.25 times as fast as std::complex<float>::operator*. If I use double, gsl_complex_add is about 30% faster than std::complex<double>::operator+ while std::complex<double>::operator* is about 10% faster than gsl_complex_mul. I only need float-level precision, but I've heard that double is faster (and memory is not an issue for me)! So now I'm really confused!

Your benchmark is almost certainly broken, regardless of what the results are. Benchmarking is hard ;-) — , Sep 07 '13 at 10:14
@delnan Okay, maybe I didn't really mean benchmarking in the technical sense of the term! How do I figure out what I should use? Is this check good enough? — Praveen, Sep 07 '13 at 10:16
From the rest of your question, you really did mean benchmarking in the technical sense of the term. It's just that hard to determine and compare performance reliably and meaningfully. That's why this question is rather tough, and the answer almost certainly depends on what you want to do with complex numbers. — , Sep 07 '13 at 10:23
@delnan Well, the complex numbers are signals, so most of the operations go into doing stuff like convolution and correlation. — Praveen, Sep 07 '13 at 10:41
@MatsPetersson I've added the results of a run with optimisation now. And I can't tell a thing about which to choose. — Praveen, Sep 07 '13 at 10:58

Mats Petersson · Accepted Answer · 2013-09-07T14:22:52.287

Turn on optimisations.

Any library or set of functions that you link with will be compiled WITH optimisation (unless the names of the developer are Kermit, Swedish Chef, Miss Peggy (project manager) and Cookie Monster (tester) - in other words, the development team is a bunch of Muppets).

Since std::complex uses templates, it is compiled by the compiler settings you give, so the code will be unoptimized. So your question is really "Why is function X faster than function Y that does the same thing, when function X is compiled with optimisation and Y is compiled without optimisation?" - which should really be obvious to answer: "Optimisation works nearly all of the time!" (If optimisation wasn't working most of the time, compiler developers would have a MUCH easier time)

Edit: So my above point has just been proven. Note that since templates can inline the code, it is often more efficient than an external library (because the compiler can just insert the instructions straight into the flow, rather than calling out to another function).

As to float vs. double, the only time that float is slower than double is if there is ONLY double hardware available, with two functions added to "shorten" and "lengthen" between float and double. I'm not aware of any such hardware. double has more bits, so it SHOULD take longer.

Edit2:

When it comes to choosing "one solution over another", there are so many factors. Performance is one (and in some cases, the most important, in other cases not). Other aspects are "ease of use", "availability", "fit for the project", etc.

If you look at ONLY performance, you can sometimes run simple benchmarks to determine that one solution is better or worse than another, but for complex libraries [not "real&imaginary" type complex numbers, but rather "complicated"], there are sometimes optimisations to deal with large amounts of data, where if you use a less sophisticated solution, the "large data" will not achieve the same performance, because less effort has been spent on solving the "big data" type problems. So, if you have a "simple" benchmark that does some basic calculations on a small set of data, and you are, in reality, going to run some much bigger datasets, the small benchmark MAY not reflect reality.

And there is no way that I, or anyone else, can tell you which solution will give you the best performance on YOUR system with YOUR datasets, unless we have access to your datasets, know exactly which calculations you are performance (that is, pretty much have your code), and have experience with running that with both "packages".

And going on to the rest of the criteria ("ease of use", etc), those are much more "personal opinion" based, so wouldn't be a good fit for an SO question in the first place.

That info sure helps. But it still doesn't give me a convincing argument to choose one over the other, as I've indicated in my question currently. I'll attempt a few more tests and edit back. — Praveen, Sep 07 '13 at 11:19
"Help me choose a solution" is an opinionbased answer, because there is nearly never a true technical answer that says "Yes" or "No" to a particular solution. You will have to choose one or the other based on what you believe is the best solution, not ask SO. — Mats Petersson, Sep 07 '13 at 11:25
I see. I guess I was hoping that the answer would be as clear cut as "Should I use numpy vector operations or for loops to sum two big arrays in python?", to give a crude analogy. I'll take your advice. — Praveen, Sep 07 '13 at 12:11
@Praveen: I have added a bit more to my answer. I still don't think we can give a good answer to the "Which should I choose?" type of question. Technical things, such as "use optimisation when using benchmarks", yes, absolutely. But "Which works better for me" is not one of those - because even if you sent us a complete benchmark with an accurate representation of the data you are likely to use, my machine may have better or worse performance than the machine you plan on using, skewing the results. — Mats Petersson, Sep 07 '13 at 14:24
So I finally tested out an array of 4500 elements (the number I use in my program) and averaged out the time taken by 1e5 runs of the program, for both addition and multiplication. I turned on -O3 and -ffast-math, since I'm not too keen on accuracy. It turns out that std::complex and gsl_complex_float (with custom addition and multiplication functions based on gsl_complex_add and gsl_complex_mul) fare about the same. So I'm choosing std::complex for ease of implementation. Thanks for all the help! — Praveen, Sep 17 '13 at 13:19
One note: PowerPC CPUs are an example of hardware where (non-vector) floating-point arithmetic is done in double-precision, but results can be rounded to single-precision. Single-precision data is extended to double when loaded into floating-point registers. — p_a_c, Aug 04 '15 at 19:50

Vivian Miranda · Answer 2 · 2013-09-10T07:29:12.927

This answer depends not only on the optimization flags, but also on the compiler used to compile GSL library and your particular code. Example: if you compile gsl with gcc and your program with icc, then you may see a (significant) difference (I have done this test with std::pow vs gsl_pow). Also, the standard makefile generated by ./configure does not compile GSL with aggressive float point optimizations (example: it does not include fast-math flag in gcc) because some GSL routines (differential equation solver for example) fail their stringent accuracy tests when these optimizations are present.

One of the great points about GSL is the modularity of the library. If you don't need double accuracy, then you can compile gsl_complex.h, gsl_complex_math.h and math.c separately with aggressive float number optimizations (however you need to delete the line #include <config.h> in math.c). Another strategy is to compile a separate version of the whole library with aggressive float number optimizations and test if accuracy is not an issue for your particular problem (that is my favorite approach).

EDIT: I forgot to mention that gsl_complex.h also has a float version of gsl_complex

typedef struct
  {
    float dat[2];
  }
gsl_complex_float;

gsl_complex vs. std::complex performance

2 Answers2