12

I wrote a small program to compute the Euclidean norm of a 3-coordinate vector. Here it is:

#include <array>
#include <cmath>
#include <iostream>

template<typename T, std::size_t N>
auto norm(const std::array<T, N>& arr)
    -> T
{
    T res{};
    for (auto value: arr)
    {
        res += value * value;
    }
    return std::sqrt(res);
}

int main()
{
    std::array<double, 3u> arr = { 4.0, -2.0, 6.0 };
    std::cout << norm(arr) - norm(arr) << '\n';
}

On my computer, it prints -1.12323e-016.


I know that floating point types should be handled with care. However, I thought that floating-point operations were at least somehow deterministic. This article about floating-point determinism states that:

Some of the things that are guaranteed are the results of addition, subtraction, multiplication, division, and square root. The results of these operations are guaranteed to be the exact result correctly rounded (more on that later) so if you supply the same input value(s), with the same global settings, and the same destination precision you are guaranteed the same result.

As you can see, the only operations that this program does on floating-point values are addition, subtraction, multiplication and square root. If I trust the article I quoted above, considering that it runs in a single thread and that I do not change the rounding modes or anything else floating-point related, I thought that norm(arr) - norm(arr) would be 0 since I do exactly the same operations on the same values twice.

Are my assumptions wrong, or is this a case of compiler not strictly conformant with regard to IEEE floating-point math? I am currently using MinGW-W64 GCC 4.9.1 32 bits (I tried every optimization level from -O0 to -O3). The same program with MinGW-W64 GCC 4.8.x displayed 0, which is what I would have expected.


EDIT: I disassembled the code. I won't post the whole generated assembly because it is too big. However, I believe that the relevant part is here:

call    ___main
fldl    LC0
fstpl   -32(%ebp)
fldl    LC1
fstpl   -24(%ebp)
fldl    LC2
fstpl   -16(%ebp)
leal    -32(%ebp), %eax
movl    %eax, (%esp)
call    __Z4normIdLj3EET_RKSt5arrayIS0_XT0_EE
fstpl   -48(%ebp)
leal    -32(%ebp), %eax
movl    %eax, (%esp)
call    __Z4normIdLj3EET_RKSt5arrayIS0_XT0_EE
fsubrl  -48(%ebp)
fstpl   (%esp)
movl    $__ZSt4cout, %ecx
call    __ZNSolsEd
subl    $8, %esp
movl    $10, 4(%esp)
movl    %eax, (%esp)
call    __ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_c
movl    $0, %eax
movl    -4(%ebp), %ecx
.cfi_def_cfa 1, 0
leave

As you can see, __Z4normIdLj3EET_RKSt5arrayIS0_XT0_EE is called twice, therefore, it is not inlined. I don't understand the whole thing though, and can't tell what is the problem.

Morwenn
  • 21,684
  • 12
  • 93
  • 152
  • Is `T res{};` guaranteed to initialize `res` to `0`? – François Moisan Sep 17 '14 at 13:14
  • 1
    `The same program with MinGW-W64 GCC 4.8.x displayed 0` - at least on 4.8.0 I also get `-1.12323e-016` – Andreas Fester Sep 17 '14 at 13:15
  • 2
    @FrançoisMoisan: Yes, value-initialisation of numeric types gives zero. – Mike Seymour Sep 17 '14 at 13:15
  • 1
    @Andreas I think that I used 4.8.1 or 4.8.2, but I had some many different versions that I can't remember exactly. – Morwenn Sep 17 '14 at 13:16
  • What optimization settings are you using? – T.C. Sep 17 '14 at 13:19
  • 1
    @T.C. I tried them all (well, not `-Of` or `-Og` but every optimization level from `-O0` and `-O3`). And I made sure not to be using `-ffast-math` or equivalents. – Morwenn Sep 17 '14 at 13:21
  • Looking at the assembly might be interesting. Probably both function calls were inlined, and optimised by replacing additions with subtractions for the second calculation, rather than calculating a second sum and subtracting that. – Mike Seymour Sep 17 '14 at 13:22
  • 2
    Interesting, I can't reproduce this with my MinGW 4.8.2 or 4.9.1 – T.C. Sep 17 '14 at 13:23
  • Compiling with a 64-bit mingw-w64 gcc with the OP code gives the expected 0 output – Niall Sep 17 '14 at 13:24
  • 3
    @MatthiasB seems to be on the right track, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=12331 – Andreas Fester Sep 17 '14 at 13:32
  • FWIW, `long double` gives the expected outcome (even on 32-bit). – Niall Sep 17 '14 at 13:38
  • @Niall I tested on 64-bit - probably why. The distro I'm using doesn't support cross-compiling to 32-bit. – T.C. Sep 17 '14 at 13:44
  • @Morwenn The assembly code you posted - is that **definitely** the code from the case which produces the strange output? There is a significant difference between 32 bit and 64 bit - your code is 64 bit which uses the `SSE` registers, while 32 bit code does not – Andreas Fester Sep 17 '14 at 13:59
  • 2
    @Andreas It seems that you are right. Too many compilers on my computer, the one launched from Code::Blocks gives the error (32 bits MinGW-w64), the one from my command-line wasn't the same: it was a 64 bits MinGW-w64 GCC 4.9.1 compiler, which gives 0. I will fix this and post the faulty assembly. Sorry for that, I was sure that I deleted the 64 bits one. – Morwenn Sep 17 '14 at 14:13
  • I bet this is the result of different register sizes on the underlying hardware. – Martin York Sep 22 '14 at 23:54

2 Answers2

11

As @MatthiasB pointed out, this seems to be an issue of gcc temporarily storing an 80 bit floating point value into a 64 bit register/memory location. Consider the following simplified program which still reproduces the issue:

#include <cmath>
#include <iostream>

double norm() {
    double res = 4.0 * 4.0 + (-2.0 * -2.0) + (6.0 * 6.0);
    return std::sqrt(res);
}

int main() {
    std::cout << norm() - norm() << '\n';
    return 0;
}

The assembly code of the essential part norm() - norm() looks like this (using 32 bit mingw 4.8.0 compiler)

...
call    __Z4normv     ; call norm()
fstpl   -16(%ebp)     ; store result (80 bit) in temporary (64 bit!)
call    __Z4normv     ; call norm() again
fsubrl  -16(%ebp)     ; subtract result (80 bit) from temporary (64 bit!)
...

Essentially, I would consider this a gcc bug, but it seems to be a complicated topic ...

Andreas Fester
  • 36,091
  • 7
  • 95
  • 123
  • In the question, the result of the call is in `%xmm0`, definitely not in a 80-bit register. – Pascal Cuoq Sep 17 '14 at 13:44
  • 1
    OP posted the assembly code after I did - have not checked it yet ;) – Andreas Fester Sep 17 '14 at 13:44
  • 1
    I realize it (the OP carefully said it was an edit), but now, with the assembly code, it is completely unexplainable, or at least does not seem to admit a non-far-fetched explanation (the most likely seems that `std::sqrt` called at the end of `norm` changes the FPU control register!) – Pascal Cuoq Sep 17 '14 at 13:48
  • Yep, in OPs assembly both results go through 64 bit registers rax and rbx - the precision should be the same then ... – Andreas Fester Sep 17 '14 at 13:53
  • 2
    Another edit later, it is clear why the value being displayed is not zero. The first result goes through `-48(%ebp)` and the second remains in the 80-bit register. – Pascal Cuoq Sep 17 '14 at 14:32
6

There is a difference in precision of a floating point number depending on where it is stored. If the compiler keeps one variable in a register, it has higher precision as a variable stored in memory. You can try to force your variables to be stored in memory before, like:

int main()
{
    std::array<double, 3u> arr = { 4.0, -2.0, 6.0 };
    volatile double v1 = norm(arr);
    volatile double v2 = norm(arr);
    std::cout << v1 - v2 << '\n';
}

This gives you the expected result of 0.

MatthiasB
  • 1,759
  • 8
  • 18