2

Background

We're working on an RTS game engine using C# and .NET Core. Unlike most other real-time multiplayer games, RTS games tend to work by synchronizing player inputs to other players, and running the game simulation in lockstep on all clients at the same time. This requires game logic to be deterministic so that games don't get out of sync.

One potential source of non-determinism are floating point operations. From what I've gathered the primary issue is with the old x87 FPU instructions - they use an internal 80-bit register, while IEEE-754 floating point values are 32-bit or 64-bit, so values are truncated when moved from registers to memory. Small changes to code and/or the compiler can result in truncation happening at different times, resulting in slightly different results. Non-determinism can also be caused by accidentally using different FP rounding modes, though if I understood correctly this is mostly a solved issue.

I've also gotten the impression that SSE(2) instructions do not suffer from the truncation issue, as they perform all floating point arithmetic in 32- or 64-bit without a higher precision register.

Finally, as far as I know the CLR uses x87 FPU instructions on x86 (or that was at least the case before RyuJIT), and SSE instructions on x86-64. I'm not sure if that means for all or most operations.

Support for accurate single precision math has recently been added to .NET Core, if that matters.

But when researching whether or not floating point can be used deterministically in .NET there are a lot of answers that say no, although they mostly concern older versions of the runtime.

  • In a StackOverflow answer from 2013 Eric Lippert said that if you want to guarantee reproducible arithmetic in .NET, you should "Use integers".
  • In a is discussion about the subject on Roslyn's GitHub page a game developer said in a comment in 2017 that they were unable to reach repeatable floating point operations in C#, though he did not specify which runtime(s) they used.
  • In a 2011 Game Development Stack Exchange answer the author concludes that he was unable to attain reliable FP arithmetic in .NET. He provides a software-based floating point implementation for .NET, which is binary compatible with IEEE754 floating point.

The question

So, if CoreCLR uses SSE FP instructions on x86-64, does that mean that it doesn't suffer from the truncation issues, and/or any other FP-related non-determinism? We are shipping .NET Core with the engine so every client would use the same runtime, and we would require that the players use exactly the same version of the game client. Limiting the engine to only work on x86-64 (on PC) is also an acceptable limitation.

If the runtime still uses x87 instructions with unreliable results, would it make sense to use a software float implementation (like the one linked in an answer above) for computations concerning single values, and accelerate vector operations with SSE using the new hardware intrinsics? I've prototyped this and it seems to be work, but is it unnecessary?

If we can just use normal floating point operations, is there anything we should avoid, like trigonometric functions?

Finally, if everything is OK so far how would this work when different clients use different operating systems or even different CPU architectures? Do modern ARM CPUs suffer from the 80-bit truncation issue, or would the same code run identically to x86 (if we exclude trickier stuff like trigonometry), assuming the implementation has no bugs?

paavohtl
  • 23
  • 1
  • 7
  • 1
    Does C# allow the contraction of a floating-multiply followed by a dependent floating-point add into a fused multiply-add (FMA) at the discretion of the compiler? Does your application make use of any transcendental functions? Does C# specify that all transcendental functions in its math library shall be correctly rounded? – njuffa Dec 31 '18 at 18:53
  • @njuffa According to [this issue](https://github.com/dotnet/coreclr/issues/17541), FMA is not emitted automatically and can only be used via an intrinsic. If they add it to the runtime, it's going to require some sort of opt-in. We'll probably use some transcendental functions (namely trigonometric) for gameplay logic, but could inconsistencies be solved by using a lookup table? – paavohtl Dec 31 '18 at 19:22
  • Does C# make any use of a flush-to-zero feature provided by floating-point hardware (SSE has that, not sure about ARM)? You could supply your own implementations of transcendental functions as long as you are able to tightly constrain the basic arithmetic, you need not use lookup tables. Floating-point arithmetic is not associative like math, does C# allow compilers to re-associate in floating-point expressions? – njuffa Dec 31 '18 at 19:39
  • @njuffa From quick Googling and browsing of CoreCLR source code I couldn't find any references to flush-to-zero. I'm pretty sure CLR guarantees that floating-point expressions cannot be re-associated, but haven't found confirmation yet. – paavohtl Dec 31 '18 at 20:03
  • 1
    Another issue to investigate is whether C# allows a compiler to evaluate floating-point expressions in higher precision than the type of the operands (e.g. is it allowed to evaluate an expression involving `float` operands using `double` arithmetic?). In C/C++ this is specified via `FLT_EVAL_METHOD`, if it is specified at all. Also, with those languages one usually needs to tell the compiler specifically that one wants strict IEEE-754 compliance, e.g. `/fp:strict` (which may inhibit certain performance optimizations). What does the C# tool chain provide in that regard? – njuffa Dec 31 '18 at 20:37
  • @njuffa: I note that this is a question-and-answer site; your comments could be expanded into full questions. To briefly answer them: the C# specification explicitly calls out that any float operation of any precision can be performed at any higher level of precision for any reason at any time, and that this can cause non-deterministic results. It does not specify in any way how instructions are to be generated by the jitter, it does not specify whether denormals go to zero. – Eric Lippert Jan 03 '19 at 01:31
  • @njuffa: The CLR specification -- and of course, C# implementations target the CLR -- requires that storing a single or double to an instance field, static field or array element truncates it back to its "required" precision. (Storing to a local or formal is not required to truncate but is permitted to.) An undocumented but guaranteed feature of C# is that inserting an unnecessary explicit cast of `(float)` or `(double)` will truncate even if doing so is otherwise a no-op. – Eric Lippert Jan 03 '19 at 01:32
  • @njuffa: The C# language specification says nothing about the qualities of the math library; that's up to the library authors. – Eric Lippert Jan 03 '19 at 01:35
  • @njuffa: The C# compiler will not generate code that rejiggers floating point operations; for example, it will not rewrite `x * y + x * z` into `x * (y + z)` or any such thing. The optimizer will remove things like multiplications by compile-time constant 1.0, additions of compile-time constant 0.0, and that sort of thing. Operations solely involving compile-time constants are computed at compile time, with the same notes about that computation being done in any level of higher precision that the compiler chooses. – Eric Lippert Jan 03 '19 at 01:37
  • @EricLippert I know nothing about C#, but a few things about floating-point arithmetic :-) I was looking for clarification of the question's context here, in the hopes of being able to formulate an *answer* once context is provided. – njuffa Jan 03 '19 at 07:01

2 Answers2

1

So, if CoreCLR uses SSE FP instructions on x86-64, does that mean that it doesn't suffer from the truncation issues, and/or any other FP-related non-determinism?

If you stay on x86-64 and you use the exact same version of CoreCLR everywhere, it should be deterministic.

If the runtime still uses x87 instructions with unreliable results, would it make sense to use a software float implementation [...] I've prototyped this and it seems to be work, but is it unnecessary?

It could be a solution to workaround the JIT issue, but you will likely have to develop a Roslyn analyzer to make sure that you are not using floating point operations without going through these... or to write an IL rewriter that would perform this for you (but that would make your .NET assemblies arch dependent... which could be acceptable depending on your requirements)

If we can just use normal floating point operations, is there anything we should avoid, like trigonometric functions?

As far as I know, CoreCLR is redirecting math functions to the compiler libc, so as long as you stay on the same version, same platform, it should be fine.

Finally, if everything is OK so far how would this work when different clients use different operating systems or even different CPU architectures? Do modern ARM CPUs suffer from the 80-bit truncation issue, or would the same code run identically to x86 (if we exclude trickier stuff like trigonometry), assuming the implementation has no bugs?

You can have some issues not related to extra precision. For example, for ARMv7, subnormal floats are flushed to zero while ARMv8 on aarch64 will keep them.

So assuming that you are staying on ARMv8, I don't know well if the JIT CoreCLR for ARMv8 is behaving in that regard; you should probably ask on GitHub directly. There is still also the behavior of the libc that would likely break deterministic results.

We are working exactly at solving this at Unity on our "burst" compiler to translate .NET IL to native code. We are using LLVM codegen across all machines, disabling a few optimizations that could break determinism (so here, overall we can try to guarantee the behavior of the compiler across the platforms), and we are also using the SLEEF library to provide deterministic calculation of mathematical functions (see for example https://github.com/shibatch/sleef/issues/187)… so it is possible to do it.

In your position, I would probably try to investigate if CoreCLR is really deterministic for plain floating point operations between x64 and ARMv8… And if it looks okay, you could call these SLEEF functions instead of System.Math and it could work out of the box, or propose CoreCLR to switch from libc to SLEEF.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
xoofx
  • 3,682
  • 1
  • 17
  • 32
0

More like food for thought than a definite answer: you may want to look into number types other than the ones built into .NET. The downside obviously is that what's in .NET is not only well understood (hmm) but hardware support is also there pretty much on every platform. But still, possibly check out posits, a new, still work in progress floating point number format.

The posit standard doesn't leave room for interpretation in the way that causes your issue, and there is an internal accumulator built in as well. Thus posit operations produce deterministic results across platforms - in theory, because hardware implementations are sparse (but exist!), and no off-the-shelf CPU supports it natively. Thus you may only use it as a soft number type, though this may only be an issue for you if such computations are on a latency-sensitive execution path.

There is also a .NET library for it that you can find here (targets .NET Framework but can very easily be switched over to .NET Standard) which can also be turned into an FPGA hardware implementation. More info is here.

Disclaimer: I'm from the company behind the .NET library (but posit wasn't invented by us).

Piedone
  • 2,693
  • 2
  • 24
  • 43
  • IEEE-754 is fully deterministic for + - * / and sqrt (correctly rounded result required, i.e. rounding error <= 0.5ulp). It's only language rules (like allowing contraction of mul+add into FMA), or higher-precision temporaries, that leads to non-determinism. Or calling non-basic FP functions like `sin()` leaves it up to the library what happens. But if all clients use the same library, you can be ok unless one has an ISA extension that allows it to choose something different... – Peter Cordes Mar 14 '19 at 08:13
  • Anyway, very interesting, I hadn't heard about Posit before. Probably not useful for gaming without HW support, though; they often want all clients to run large parts of the game simulation in lockstep. This can include stuff that's worth speeding up with SIMD, because even scalar hardware isn't as fast as they'd like. Maybe some games only need a small amount of state that isn't computationally intensive to stay in sync, though. – Peter Cordes Mar 14 '19 at 08:15
  • Oh, I see posit requires correctly-rounded results from *every* math function, including `log` and `cos`. https://posithub.org/docs/BeatingFloatingPoint.pdf. And that it requires several fused operations to be available. I guess the idea is that if you want them, you need to ask for them explicitly, so compilers would be or could be forbidden from optimizing. (Similar to `#pragma STDC FP_CONTRACT off` in ISO C, I guess) – Peter Cordes Mar 14 '19 at 08:42
  • 1
    While I have limited knowledge about the rationale behind all the details I think the idea here is having a big scratchpad (called the quire) to store intermediate results between operations is supposed to be cheap, so fused operations that used that should be available on every platform. But it will be only used if you explicitly use these since one of the design principles is that everything should be apparent from code. – Piedone Mar 14 '19 at 12:18