floating point processor non-determinism?

Question

Without getting into unnecessary details, is it possible for operations on floating-point numbers (x86_64) to return -however small- variations on their results, based on identical inputs? Even a single bit different?

I am simulating a basically chaotic system, and I expect small variations on the data to have visible effects. However I expected that, with the same data, the behavior of the program would be fixed. This is not the case. I get visible, but acceptable, differences with each run of the program.

I am thinking I have left some variable uninitialized somewhere...

The languages I am using are C++ and Python.

ANSWER

Russell's answer is correct. Floating point ops are deterministic. The non-determinism was caused by a dangling pointer.

Can you provide an [SSCCE](http://sscce.org/)? – Mysticial Jul 15 '12 at 06:50 — Mysticial, Jul 15 '12 at 06:50
I am afraid not, it's a fairly complex program. :-( – Panayiotis Karabassis Jul 15 '12 at 06:52 — Panayiotis Karabassis, Jul 15 '12 at 06:52

score 5 · Answer 1 · answered Jul 15 '12 at 07:04

5

Yes, this is possible. Quoting from the C++ FAQ:

Turns out that on some installations, cos(x) != cos(y) even though x == y. That's not a typo; read it again if you're not shocked: the cosine of something can be unequal to the cosine of the same thing. (Or the sine, or the tangent, or the log, or just about any other floating point computation.)

Why?

[F]loating point calculations and comparisons are often performed by special hardware that often contain special registers, and those registers often have more bits than a double. That means that intermediate floating point computations often have more bits than sizeof(double), and when a floating point value is written to RAM, it often gets truncated, often losing some bits of precision.

answered Jul 15 '12 at 07:04

Thomas

174,939
50
355
478

Thanks, I could not find any uninitialized variables, so this must be it. – Panayiotis Karabassis Jul 15 '12 at 07:19
7

That's not non-determinism, though. The same inputs into the same code should give the same results. The explanation you give here allows for two superficially similar blocks of code to yield two different results from the same inputs, but that's a different thing entirely. – Russell Borogove Jul 15 '12 at 07:55
A context switch could result in registers being swapped out to RAM and back in, and context switches can occur at "nondeterministic" moments (from the program's point of view). Therefore, the same code could give different results depending on when a context switch happens. Or am I misunderstanding your point? – Thomas Jul 15 '12 at 09:58
5

@Thomas: Context switches do not change registers, as visible by a process. The operating system saves and restores all state of the architecture-specified programming environment, including all bits of all regular registers. (A few special things may change, such as the lock state of load-and-reserve and store-conditional instructions.) Consider that these “extra” bits are unspecified in languages like C, but they are absolutely part of the programming environment for assembly-language programmers and must be preserved by the operating system. – Eric Postpischil Jul 15 '12 at 11:08
Has this behavior actually been observed in a high-quality compiler that is largely conforming to C/C++? If cos were written as a C or C++ function, the language standards require that floating-point values be cast to the semantic type by the return statement. In other words, even if the compiler calculates with extended precision, the return value must be converted to double. Thus, given x==y, cos(x)==cos(y); after the return, there are no extra bits to retain or lose. A better example might be `x+y==x+y`, since this expression has no return to force a conversion. – Eric Postpischil Jul 15 '12 at 11:31
One note on the above: The language standards may be unclear on whether the semantics of library functions are as if they were written in the language itself. The C standard, for example, contains phrases about library functions returning, but one might argue that this return is not the same as a C return statement. The library function may be written in another language, not using C semantics. However, it could be a poor implementation choice for the library implementation of cos to return a value not representable in double. – Eric Postpischil Jul 15 '12 at 11:34
@Thomas: I wanted to ask, do you know if the truncation from internal representation to RAM happens in a non-deterministic way? – Panayiotis Karabassis Jul 16 '12 at 08:11
I dare not say anymore. Several people have been commenting who seem to know more about it than I do :) – Thomas Jul 16 '12 at 17:16
Thanks, I solved it, it was a dangling pointer, See my comment on the other answer. Russell was right, floating point ops are (should be) deterministic across runs, even though superficially similar code can return different results in the same run. There's valuable insight in your answer, thanks. – Panayiotis Karabassis Jul 18 '12 at 09:05
While the references are true, this has nothing to do with determinism. Determinism is not related to precision, but to consistency. Reality: common languages norms generally don't guarantee consistency in floating-point calculations, even for the same hardware, but in practice, it usually works. Determinism is the Holy Graal of programs, everything should be done to not lose this fantastic property and regarding float, this is a crossplatform issue related to the lack of normative ways of coping with different FPUs behaviors. It will be sorted out, eventually (hope I'll live long enough). – Alex Jan 31 '16 at 18:20
@PanayiotisKarabassis Conversions from "extended precision" in FPU registers to RAM occurs when compiled code says it does. Read the disassembly, there is no non-determinism in it. The problem is that it is outside of the control of the programmer, which is a semantic catastrophe. – curiousguy Nov 06 '18 at 02:23

score 4 · Accepted Answer · answered Jul 15 '12 at 08:03

4

Contra Thomas's answer, floating point operations are not non-deterministic. They are fiendishly subtle, but a given program should give the same outputs for the same inputs, if it is not using uninitialized memory or deliberately randomized data.

My first question is, what do you mean by "the same data"? How is that data getting into your program?

answered Jul 15 '12 at 08:03

Russell Borogove

18,516
4
43
50

Non-determinism may occur in multi-threaded programs which distribute floating-point jobs among multiple threads, if the timing of the threads affects the order in which results are incorporated into subsequent calculations. Non-determinism may also occur between different compilers, use of different compiler switches, compilations of the same program for different targets, different compiler versions, and even different compilations of the same source with identical switches (the compiler itself may be non-deterministic). – Eric Postpischil Jul 15 '12 at 11:43
Well, the "data" is built into the program, a set of initial conditions are generated from constant literals, and then the simulation proceeds deterministically. No user input is accepted, no timer is read, no pointers are converted to ints, etc. To the best of my knowledge. Compiling with gcc's -Weffc++ warns against no uninitialized data. The simulation happens entirely in C++ and is only visualized by python. My other concern is iterator access, but as far as I know that happens deterministically too (first to last in std::vector, which is the only collection I use). – Panayiotis Karabassis Jul 15 '12 at 12:02
The program is not recompiled between runs. BTW what about Thomas's quotation? Is it plainly wrong? Because it mentions the same installation, and the example cos(x) != cos(y) suggests a single run. – Panayiotis Karabassis Jul 15 '12 at 12:04
I think Thomas's quotation is pointing out that `cos(x)` can `!= cos(x)` for *different calls* to `cos()` in the same program, in contexts where register allocation or other compiler optimization pressures are different. A single call to `cos()` at a single point in the program compiles to a particular sequence of machine instructions, and those instructions produce the same output from the same input. – Russell Borogove Jul 15 '12 at 18:00
Eric, those are good points except you're using the term non-determinism way, way too loosely. In the multithreading case, the order of thread execution, while unpredictable and unknown, *is one of the inputs* to the algorithm, and hence the result is deterministic for given inputs. – Russell Borogove Jul 15 '12 at 18:02
In regard to your problem, I recommend you divide and conquer, checking your intermediate results as your algorithm works, until you can isolate a case where identical inputs are giving identical outputs. At that point it will likely be blindingly obvious what subtlety we've all forgot to mention so far. – Russell Borogove Jul 15 '12 at 18:06
I have been able to isolate it to this: `double d1=foo(); double d2=foo(); d1==d2;` This one says they are equal. Then: `double d1=foo(); std_vector_unrelated_operation(); double d2=foo(); std_vector_unrelated_operation(); d1==d2;` And this one says they are not! By Thomas's explanation this means that the vector operations cause d1 and d2 to be written to RAM. Assuming that their internal (hardware) representations were the same, it suggests that the conversion happens in a non-deterministic way. Does this sound plausible? – Panayiotis Karabassis Jul 16 '12 at 08:15
As far as I know the register/memory conversion should be deterministic. Do d1 and d2 agree to 16 digits or so, or is there something more drastic going on here? – Russell Borogove Jul 16 '12 at 15:47
Damn. This is destroying my mental health. :-D It was uninitialized data, of sort. I meant to make a reference to something. Instead I made a copy on the stack (a single & missing in the argument list declaration). The copy was released, and I had taken its address. Now that this is fixed, the simulation seems to have become deterministic. Why C++ would segfault on my my error, and the same code, called as a Python module would run, I don't really want to know. So thanks, you were right. – Panayiotis Karabassis Jul 18 '12 at 09:02
Glad to know I wasn't crazy... :) – Russell Borogove Jul 18 '12 at 16:07
@RussellBorogove "_A single call to cos() at a single point in the program_" is not a thing. There is no "point in a program". There is no such guarantee, only a false expectation. That's why the situation is absolutely catastrophic and yet almost nobody cares. – curiousguy Nov 06 '18 at 02:30
@curiousguy In at least certain cases, for statically compiled languages, a single `cos()` in the source is isomorphic with a particular sequence of instructions, or even a single `fsincos()`; those instructions are deterministic for a given machine state at the time of execution. – Russell Borogove Nov 06 '18 at 02:42
@RussellBorogove Even when compiling with debug support, it is not the case with GCC that a "single point in the program" corresponds to a single address in object code. It may be the case with `-O0`, but I don't think it's absolutely sure. – curiousguy Nov 06 '18 at 03:11

floating point processor non-determinism?

ANSWER

2 Answers2

Linked