I recently upgraded my OS from Debian 9 to Debian 11. I have a bunch of servers running a simulation and one subset produces a certain result and another subset produces a different result. This did not used to happen with Debian 9. I have produced a minimal failing example:
#include <stdio.h>
#include <math.h>
int main()
{
double lp = 11.525775909423828;
double ap = exp(lp);
printf("%.14f %.14f\n", lp, ap);
return 0;
}
The lp value prints the same answer on every machine but I have two different answers for ap: 101293.33662281210127 and 101293.33662281208672
The code was compiled with "gcc fptest.c -lm -O0". The '-O0' was just added to ensure optimizations weren't an issue. It behaves the same without this option.
The libraries linked in the Debian 11 version are libm-2.31.so and libc-2.31.so.
The libraries linked in the (working) Debian 9 version are libm-2.24.so and libc-2.24.so.
The servers are all running with different CPUs so its hard to say much about that. But I get different results between a xeon E5-2695 v2 and a xeon E5-2695 v3 for example.
Amongst all the processors I have, I only see one of these two results on Debian 11, and when running on Debian 9 I consistently only get one result.
This feels like a bug in libm-2.31 and/or libc-2.31 to me. I have zero experience with this sort of thing. Could someone please explain if what I am seeing is expected? Does it look like a bug? Anything I can do about it? etc.
Also tried compiling with clang, and get the exact same problem.
Also note that the binary compiled on Debian 9 runs on Debian 11 and produces the same results/problem as the Debian 11 binary adding further weight to my suspicion that this is library related (I cannot run the Debian 11 binary on Debian 9).
Update
Just read this post which was helpful. So I'm happy that different architectures may give different results for the exp() function. But all my processors are x86_64 and some kind of intel xeon-xxxx. I can't understand why the exact same binary with the exact same libraries is giving different results on different processors.
As suggested in that post I printed the values using %a. The two answers differ only by the LSB. If I use expl() I get the same answer on all machines.
An explanation of why I'm seeing differences, and if this is expected, would be nice. Any compiler flags that ensure consistency would also be nice.