4

I recently upgraded my OS from Debian 9 to Debian 11. I have a bunch of servers running a simulation and one subset produces a certain result and another subset produces a different result. This did not used to happen with Debian 9. I have produced a minimal failing example:

#include <stdio.h>
#include <math.h>

int main()
{
  double lp = 11.525775909423828;
  double ap = exp(lp);

  printf("%.14f %.14f\n", lp, ap);

  return 0;
}

The lp value prints the same answer on every machine but I have two different answers for ap: 101293.33662281210127 and 101293.33662281208672

The code was compiled with "gcc fptest.c -lm -O0". The '-O0' was just added to ensure optimizations weren't an issue. It behaves the same without this option.

The libraries linked in the Debian 11 version are libm-2.31.so and libc-2.31.so.

The libraries linked in the (working) Debian 9 version are libm-2.24.so and libc-2.24.so.

The servers are all running with different CPUs so its hard to say much about that. But I get different results between a xeon E5-2695 v2 and a xeon E5-2695 v3 for example.

Amongst all the processors I have, I only see one of these two results on Debian 11, and when running on Debian 9 I consistently only get one result.

This feels like a bug in libm-2.31 and/or libc-2.31 to me. I have zero experience with this sort of thing. Could someone please explain if what I am seeing is expected? Does it look like a bug? Anything I can do about it? etc.

Also tried compiling with clang, and get the exact same problem.

Also note that the binary compiled on Debian 9 runs on Debian 11 and produces the same results/problem as the Debian 11 binary adding further weight to my suspicion that this is library related (I cannot run the Debian 11 binary on Debian 9).

Update

Just read this post which was helpful. So I'm happy that different architectures may give different results for the exp() function. But all my processors are x86_64 and some kind of intel xeon-xxxx. I can't understand why the exact same binary with the exact same libraries is giving different results on different processors.

As suggested in that post I printed the values using %a. The two answers differ only by the LSB. If I use expl() I get the same answer on all machines.

An explanation of why I'm seeing differences, and if this is expected, would be nice. Any compiler flags that ensure consistency would also be nice.

user3856370
  • 199
  • 9
  • 3
    Given different results from the same binary and libraries on different processors, a first step would be to single-step through the `exp` call in a debugger in two machines side-by-side. See if the software is executing the same instructions or different instructions. If it executes the same instructions, the next step is to watch the data (including full precision with all floating-point bits visible) during single-stepping and see where different data is produced. Then dive into the Intel documentation for the processor models and see what is different. – Eric Postpischil Feb 28 '22 at 15:59
  • 3
    If it executes different instructions, then the library was written to be sensitive to the processor model. That would be unusual for a Xeon E5-2695 v2 versus Xeon E5-2695 v3 difference. Then dive into the source code for the library to see why it is making that choice. – Eric Postpischil Feb 28 '22 at 16:01
  • "If I use expl() I get the same answer on all machines." --> What was that same answer? – chux - Reinstate Monica Feb 28 '22 at 20:06
  • @chux-ReinstateMonica 101293.33662281210127 or 0x1.8bad562ce9a11p+16 with %a – user3856370 Feb 28 '22 at 20:09
  • @user3856370 Did you save and print the result of `expl()` into a `double` or `long double`? It appears you used `double`. Did you get the same answer when saving/printing with `long double`? – chux - Reinstate Monica Feb 28 '22 at 20:16
  • 1
    @chux-ReinstateMonica I used double, but long double gives the same answer (printing with %.14Lf) – user3856370 Feb 28 '22 at 20:28
  • user3856370 Step back for a higher level view. The goal is detect errors. Comparative code testing is a way to detect error candidates. Did the same thing happen on both machines? If not, an "error" is suspected. Logic holes include: if the same result, both results might be wrong. Another hole: different does not certainly mean wrong, just degrees of correctness. In the case of `exp()`, the degree threshold is _not_ the best answer, but only near the best answer. _Best_ answer is ever increasingly more work - work met more often, and _different_, as time goes on with better CPUs/algorithms. – chux - Reinstate Monica Mar 01 '22 at 00:02

4 Answers4

4

Background

Some notes to clarify the results of e11.5257759094238_28.

11.5257759094238_28 is not exactly representable as a double, instead a nearby value of 0x1.70d328p+3 is used or exactly 11.5257759094238_28125, so we really are investigating e11.5257759094238_28125 and how well exp() performs.

We have 3 results, 2 from various compilers exp() and a 3rd mathematical one . The 2 C code answers are 1 ULP apart and the math one is between them,

                                              0072831    difference from math
0x1.8bad562ce9a10      p+16  101293.336622812 0867163... OP's smaller answer
0x1.8bad562ce9a10802...p+16  101293.336622812 0939994... Math
0x1.8bad562ce9a11      p+16  101293.336622812 1012682... OP's larger answer
                                              0072688    difference from math

The math answer is very nearly half way between the two representable doubles with the larger of OP's answers being 1/1000 ULP closer.

Candidate contributors to a difference

  • User code and math libraries are not as exactly the same as OP supposes. (IMO - most likely)

  • Code uses the processors' built in e(x) hardware primitives instruction and those render different answers. (possible)

  • Inherited rounding modes are not the same (doubtful)


Getting consistent answers - to the last bit - is quite challenging in this case. A common issue is that newer processor and support libraries tend to render better answers than before. This is part of the Table-maker's_dilemma and test code should be tolerant of these minute (<= 1.0 ULP) differences in transcendental functions like exp().


Note that using long double may not help as some implementations have long double encoding exactly as double.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • Thanks for your input. Regarding "User code and math libraries are not as exactly the same as OP supposes". I can assure you they are. I have tried this with a statically linked executable. That same binary gives two different answers on two different xeon processors. – user3856370 Feb 28 '22 at 21:11
  • 1
    @user3856370 Good that you are diligent. Sounds like the processors' HW/microcode differs then. Perhaps extract their revisions. Curious, given a difference, what is your preferred reference value: the older one, more common one or the more correct/best one? I find the question focus on difference and not correctness interesting as post does not identify which case is providing which answer. – chux - Reinstate Monica Feb 28 '22 at 21:29
  • For my application that kind of accuracy is not important. These tiny errors do build up over the course of the simulation into errors which are noticeable in the first decimal place (or rarely, even worse) but in the grand scheme of what I'm trying to do they are unimportant. I do however value repeatability. This is a simulation with hundreds of inputs and it's very easy to mess things up. Whenever I can't reproduce a previous result I worry that I've messed up the input files. or worse still that I've introduced some kind of memory fault into the code which means I'm picking up random .. – user3856370 Feb 28 '22 at 21:37
  • ...data from somewhere. That's happened before. Though the changes were much less subtle then. – user3856370 Feb 28 '22 at 21:37
  • 3
    @user3856370 Note that "These tiny errors" --> the "101293.336622812 0867163" one is the weak result - hate to call it an "error". The "101293.336622812 1012682" one is "correct". If you want absolute consistency, do not use `exp()`, but one crafted from your own code without library calls. Yet in that case I suspect the speed, correctness and cost will suffer, but at least you will have consistency. I feel for your "Whenever I can't reproduce a previous result" dilemma, but with FP code that is simple something one must cope and re-write tests to tolerate tiny differences. – chux - Reinstate Monica Feb 28 '22 at 21:48
4

Finally figured out what's going on.

Before my code even begins, libc_start_main does a call to ieee754_exp_ifunc. This function appears to select which function does the exp(). On one set of machines it selects ieee754_exp_avx and on the other set it selects ieee754_exp_fma.

The ieee754_exp_ifunc has a comment about dl_x86_cpu_features so it seems that the exponential function implementation is being chosen based on the processor capabilities. I didn't dig any further than that.

In Debian 11, on machines which have both avx and fma capability, fma is chosen. For the subset of my machines that don't have fma it obviously uses avx. However in Debian 9 it used avx even on processors which have fma.

Finally, I tried compiling with gcc using '-mavx'. It gave that a good ignoring and still used fma on the machines that have it. I wonder if/how I can force avx? I guess that's a question for another post.

user3856370
  • 199
  • 9
  • Consider detailing what "avx" and "fma" implies here. C has `fma()`, yet the "fma" here seems to be the similar _fuse_multiply_add_ processor instruction. Sadly `fma()` is sometimes [poorly implemented](https://stackoverflow.com/q/42166563/2410359). – chux - Reinstate Monica Mar 01 '22 at 16:53
3

With a few exceptions, glibc does not aim for correctly rounded math functions.

This means that results can vary slightly between different glibc versions and different processors. Processor differences can result from different implementations based on processor capabilities, for example depending on whether the processor implements FMA instructions.

Florian Weimer
  • 32,022
  • 3
  • 48
  • 92
-1

It’s not a bug. Floating point arithmetic has rounding errors. For single arithmetic operations + - * / sqrt the results should be the same, but for floating-point functions you can’t really expect it.

In this case it seems the compiler itself produced the results at compile time. The processor you use is unlikely to make a difference. And we don’t know whether the new version is more or less precise than the old one.

gnasher729
  • 51,477
  • 5
  • 75
  • 98
  • The 101293.33662281210127 result is more accurate; the `double` nearest 11.525775909423828 is 11.525775909423828125, and the `double` nearest e^11.525775909423828125 is 101293.336622812101268209517002105712890625. – Eric Postpischil Feb 28 '22 at 13:23
  • 4
    Note that the different results here are not due to “Floating point arithmetic has rounding errors” but are due to the imperfect implementation of `exp`. Errors caused by rounding results to the nearest representable value are caused by the floating-point format. But `exp` could be implemented to return the nearest representable value; the fact that it does not is not a limitation of the floating-point format. It is a choice made by its implementors. – Eric Postpischil Feb 28 '22 at 13:25
  • @EricPostpischil Thanks for the comment. Not sure I fully understand though. I have the same binary with the same libraries just running on different processors and giving different results. Is it that different processors somehow calculate the exp() function differently? Why was everything consistent with my old OS? – user3856370 Feb 28 '22 at 13:32
  • 1
    @gnasher I don't think the answer is being evaluated at compile time. The assembly for both versions has a call to 'call exp@PLT'. And, the processor is definitely making a difference. I understand that rounding errors exist etc, but I would expect them to be the same errors on every machine I run on. This is not the case. – user3856370 Feb 28 '22 at 13:36