1

I have a high-precision ODE (ordinary differential equations) solver written on C++. I do all calculations with user-defined type real_type. There is a typedef declaring this type in the header:

typedef long double real_type;

I decided to change long double type to __float128 for more accuracy. In addition to this I included quadmath.h and replaced all standard math functions by ones from libquadmath.

If "long double" version is builded without any optimization flags, some reference ODE is solved in 77 seconds. If this version is builded with -O3 flag, the same ODE is solved in 25 seconds. Thus -O3 flag speeds up calculations in three times.

But in "__float 128" version builded without flags similar ODE is solved in 190 seconds, and with -O3 in 160 seconds (~ 15% difference). Why -O3 optimization does such a weak effect for quadruple precision calculations? Maybe I should use other compiler flags or include other libraries?

Alex Koksin
  • 129
  • 1
  • 6
  • 3
    I post this as a comment not as a answer because it is just an "informed guess": compiler optimization flags enable/disable certain optimizations. Depending on your program, data and control flow, data types used, ..., different optimizations have different kind of effects. So just because with one datatype you get a good speedup with -O3, doesn't mean you can get the same speedup when replacing a datatype, or perform any other change. So in my opinion, there is no reason to expect the same speedup. – loonytune Jun 26 '15 at 07:40
  • 1
    The __float128 operations are already compiled (optimized) in the library, they don't get recompiled for every application. The only way this might happen is if you compile libquadmath with -flto, which is not that easy, and you probably wouldn't gain much. – Marc Glisse Jun 26 '15 at 07:42
  • Thanks for your replies. Do you think that speed limit is reached or I can get more speed up? – Alex Koksin Jun 26 '15 at 07:46
  • If you want more precision but still high performance, [double-double arithmetic](https://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format#Double-double_arithmetic) (or even triple-double, quad-double...) may be a good candidate because it can be done in hardware instead of software like quadruple arithmetic. http://stackoverflow.com/q/9857418/995714 http://stackoverflow.com/q/6769881/995714 – phuclv Jun 26 '15 at 08:48

3 Answers3

2

Compiler optimizations work like this: the compiler recognizes certain patterns in your code, and replaces them by equivalent, but faster versions. Without knowing exactly what your code looks like and what optimizations the compiler performs, we can't say what the compiler is missing.

It's likely that several optimizations that the compiler knows how to perform for native floating point types and their operations, it doesn't know to perform on __float128 and library implementations of the operations. It might not recognize these operations for what they are. Maybe it can't look into the library implementations (you should try compiling the library together with your program and enabling link-time optimization).

Sebastian Redl
  • 69,373
  • 8
  • 123
  • 157
0

The same optimizations provided substantially the same benefit. The percentage went down just because the math itself took longer.

To believe the optimizations should be the same percentage, you'd have to believe that making the math take longer would somehow make the optimizer find more savings. Why would you think that?

David Schwartz
  • 179,497
  • 17
  • 214
  • 278
0

If your target is the x86 architecture, then in GCC __float128 is an actual quadruple precision FP type, while long double is the x87 80-bit FP type (double extended).

It is reasonable that math with smaller precision types can be faster than math with larger precision types. It is also reasonable that math with native hardware types can be faster than math with non-native types.

Dwayne Towell
  • 8,154
  • 4
  • 36
  • 49
Theodoros Chatzigiannakis
  • 28,773
  • 8
  • 68
  • 104