Weird LTO behavior with -ffast-math

Question

Summary

Recently I encountered a weird issue regarding LTO and -ffast-math where I got inconsistent result for my "pow" ( in cmath ) calls depending on whether -flto is used.

Environment:

$ g++ --version
g++ (GCC) 8.3.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ ll /lib64/libc.so.6
lrwxrwxrwx 1 root root 12 Sep  3  2019 /lib64/libc.so.6 -> libc-2.17.so

$ ll /lib64/libm.so.6
lrwxrwxrwx 1 root root 12 Sep  3  2019 /lib64/libm.so.6 -> libm-2.17.so

$ cat /etc/redhat-release 
CentOS Linux release 7.5.1804 (Core)

Minimal Example

Code

fixed.hxx

#include <cstdint>
double Power10f(const int16_t power);

fixed.cxx

#include "fixed.hxx"
#include <cmath>

double Power10f(const int16_t power)
{
    return pow(10.0, (double) power);
}

test.cxx

#include <iostream>
#include <cmath>
#include <iomanip>
#include <cstdint>
#include "fixed.hxx"

int main(int argc, char** argv)
{
    if (argc >= 3) {
        int64_t value = (int64_t)atoi(argv[1]);
        int16_t power = (int16_t)atoi(argv[2]);
        double x = Power10f(power);
        std::cout.precision(17);
        std::cout << std::scientific << x << std::endl;
        std::cout << std::scientific << (double)value * x << std::endl;
        return 0;   
    }
    return 1;
}

Compile & Run

Compile it with -ffast-math and with/without -flto gives different results

With -flto will eventually call the __pow_finite version and gives the an "accurate" result:

$ g++ -O3 -DNDEBUG -ffast-math -std=c++17 -flto  -o fixed.cxx.o -c fixed.cxx
$ g++ -O3 -DNDEBUG   -o fdtest fixed.cxx.o test.cxx
$ ./fdtest 81 20
1.00000000000000000e+20
8.10000000000000000e+21
$ objdump -DC fdtest > fdtest.dump
$ cat fdtest.dump
...
0000000000400930 <Power10f(short)>:
  400930:       0f bf ff                movswl %di,%edi
  400933:       66 0f ef c9             pxor   %xmm1,%xmm1
  400937:       f2 0f 10 05 99 00 00    movsd  0x99(%rip),%xmm0        # 4009d8 <_IO_stdin_used+0x8>
  40093e:       00 
  40093f:       f2 0f 2a cf             cvtsi2sd %edi,%xmm1
  400943:       e9 d8 fd ff ff          jmpq   400720 <__pow_finite@plt>
  400948:       0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
  40094f:       00
...

Without -flto eventually calls __exp_finite ( as an optimization enabled by -ffast-math if I guess right ), and gives an "inaccurate" result.

$ g++ -O3 -DNDEBUG -ffast-math -std=c++17  -o fixed.cxx.o -c fixed.cxx
$ g++ -O3 -DNDEBUG   -o fdtest fixed.cxx.o test.cxx
$ ./fdtest 81 20
1.00000000000000786e+20
8.10000000000006396e+21
$ objdump -DC fdtest > fdtest.dump
$ cat fdtest.dump
...
0000000000400930 <Power10f(short)>:
  400930:       0f bf ff                movswl %di,%edi
  400933:       66 0f ef c0             pxor   %xmm0,%xmm0
  400937:       f2 0f 2a c7             cvtsi2sd %edi,%xmm0
  40093b:       f2 0f 59 05 95 00 00    mulsd  0x95(%rip),%xmm0        # 4009d8 <_IO_stdin_used+0x8>
  400942:       00 
  400943:       e9 88 fd ff ff          jmpq   4006d0 <__exp_finite@plt>
  400948:       0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
  40094f:       00
...

Question

Is the above example expected behavior or is there something wrong with my code that caused this unexpected behavior?

Update

The same result can also be observed on some other platforms ( e.g. ArchLinux with g++ 12.1 and glibc 2.35 ).

On an unrelated note, please try to avoid C-style casts (like e.g. `(int64_t)atoi(argv[1])`). C-style casts are often a sign that you're doing something wrong. *If* you need a cast, use `static_cast` and if not possible use `reinterpret_cast`. — Some programmer dude, Jun 28 '22 at 10:01
Also unrelated, but by using two (non-negative) integers in a "power" operation should always get an integer result. And for integer powers, it's almost always better to implement it yourself as a simple loop with multiplication. Your program (as shown) can be written with all integer operations. — Some programmer dude, Jun 28 '22 at 10:03
@Someprogrammerdude: With GCC, `-flto` writes GIMPLE (an internal representation) during the compilation phase. It is definitely not just a linker flag. Of course, the linker does need to parse that GIMPLE. — MSalters, Jun 28 '22 at 10:05
@Someprogrammerdude Thanks for the tip. This is just a minimal example that I made up for this SO question. I do will use "static_cast" in this case in my real code :) — Liu Wei, Jun 28 '22 at 10:07
@Someprogrammerdude: " by using two integers in a "power" operation should always get an integer result." - except when it overflows. `double` can represent quite a few integer numbers exactly that `std::int64_t`cannot. Even `float` can! — MSalters, Jun 28 '22 at 10:07
Also, the `atoi` function have no validation of the input string. Use e.g. `std::stoi` (or `std::stoll` for the 64-bit value) to add validation, as well as getting better type results. — Some programmer dude, Jun 28 '22 at 10:08
I would not be surprised if this is intended behavior. `-ffast-math` is specifically meant to allow transformations which do not preserve the "correct" value according to specifications. — user17732522, Jun 28 '22 at 10:14
@user17732522 I understand that `-ffast-math` can give less accurate result and I'm actually expecting that. However, the main question is why `-flto` will change the behavior in this case. — Liu Wei, Jun 28 '22 at 10:16
@LiuWei I think you should add the `-ffast-math` option to the linker command as well if you want LTO to produce the same output. LTO delays some optimizations to link time and the compiler needs to figure out what optimization options to use then. I don't remember exactly what GCC does, but either it takes the flags from the linker invocation or it uses a common denominator of the flags used to compile the LTO units. — user17732522, Jun 28 '22 at 10:32
@user17732522 Thanks a lot. That looks to be able to do the trick and LTO builds can give the same result as the non-LTO builds. I think your answer should be accepted. Could you write up a formal answer so I can accept it? — Liu Wei, Jun 28 '22 at 11:17
@LiuWei The current top answer is already sufficient. (`-flto` at link time is still required as well if you want to make sure that LTO is correctly applied. At least that is what GCC expects you to do.) — user17732522, Jun 28 '22 at 11:20
@user17732522 Thanks. I will then accept the top answer. And thank you all. — Liu Wei, Jun 28 '22 at 11:21

score 6 · Accepted Answer · answered Jun 28 '22 at 10:46

6

man gcc:

To use the link-time optimizer, -flto and optimization options should be specified at compile time and during the final link. It is recommended that you compile all the files participating in the same link with the same options and also specify those options at link time. For example:
              gcc -c -O2 -flto foo.c
              gcc -c -O2 -flto bar.c
              gcc -o myprog -flto -O2 foo.o bar.o

answered Jun 28 '22 at 10:46

n. m. could be an AI

112,515
14
128
243

this looks to be "partly" correct. :) Adding `-flto` to the link phase does not fixed the issue, but passing `-ffast-math` to the link phase did the trick :) ( as @user17732522 mentioned in the comments above ) – Liu Wei Jun 28 '22 at 11:18

score 0 · Answer 2 · answered Jun 28 '22 at 10:57

0

-ffast-math gives the compiler permission to be inconsistent for whatever reasons it wants. Modifying even notionally unrelated code in the function could easily lead to pow returning different results thanks to different optimization strategies being chosen. And -flto changes quite a bit about how/when optimization is done, so there's a lot of room for that to happen.

If you care about numerical precision, or numeric consistency, or numerics in general, do not use -ffast-math. The transformations it performs are generally available to you as a programmer, and if you do them yourself, you can rely on their consistency.

answered Jun 28 '22 at 10:57

Sneftel

40,271
12
71
104

Thanks for the tip. Actually what my original expectation is NOT "accurate" result. My expectation is consistent results with or without LTO. Actually the result is the same if I add `-flto` to the final link stage. ( not consistent with the result without `-flto` ) – Liu Wei Jun 28 '22 at 11:09
@LiuWei Exactly. People have been known to turn on -ffast-math because they decided they don't care about accuracy, and then they get bitten by issues with consistency. In this particular case you were able to get consistent behavior; in other cases you won't. See https://stackoverflow.com/questions/7295861/enabling-strict-floating-point-mode-in-gcc for how you get consistent behavior. – Sneftel Jun 28 '22 at 11:15