1

Background

So I'm making a simple game of unfair hangman in C# (there's a list of words that are valid solutions based on the information I've to the player (e.g. word length) which dwindles down over time, and the player only wins when there's only one possible solution and they've guessed all the letters).
Anyway, the important thing to know is I have a line of code that looks like this:
availableWords.RemoveAll(word => AmountInCommon(word) != lowestInCommon);
(I remove all the words from the availableWords that have more letters in common with the letters the user has guessed than the word in availableWords with the least letters in common)
The problem is that this code only works when compiler optimizations are turned off.

Demo code showcasing the problem

class Program {
    static void Main() {
        float randomPowerOf2 = (float)Math.Pow(2, new Random().Next(-4, 5));

        float foo1 = Foo(3f); // some non power of 2
        float foo2 = Foo(randomPowerOf2);
        float baz = Baz();

        // prints false with compiler optimizations; true without compiler optimizations
        Console.WriteLine(Math.Round(Foo(3f), 2) == Math.Round(foo1, 2));

        // prints true with and without compiler optimizations
        Console.WriteLine(Math.Round(Foo(3f), 2) == Math.Round(foo1, 2));

        // prints true with and without compiler optimizations
        Console.WriteLine(Foo(randomPowerOf2) == foo2);

        // prints true with and without compiler optimizations
        Console.WriteLine(Baz() == baz);

        Console.ReadLine();
    }
    static float Foo(float divisor) {
        return 1 / divisor;
    }
    static float Baz() {
        return 1 / 3f;
    }
}

I assume this has something to do with how floating-point numbers work, but what confuses me is why toggling on and off compiler optimizations affects it. I'm guessing what I should do is just round the floats to 3 or so decimals of precision, since that's all I really need. Why does this happen and is my solution best-practice?

Edit:

I realized that in this case since the word length doesn't change I could just have AmountInCommon return the integer representing how many letters are in common instead of a float that represents the ratio of letters in common to word length. Your answers are still helpful though.

Aaron Record
  • 330
  • 4
  • 13
  • 2
    [This SO question](https://stackoverflow.com/q/6683059/3791245) and its answers seem highly relevant for the _why_ part of your question. Key takeaway is that JIT optimization is free to mess around with bit levels for floats all it wants. – Sean Skelly Oct 21 '20 at 23:02
  • @SeanSkelly looks interesting, thanks – Aaron Record Oct 21 '20 at 23:05
  • I'm not sure if its 'best practice', but I feel your solution is on the right track. Rounding is ultimately an Epsilon approach; it just defines an epsilon value based on the sig figs you give to the rounding. Just be careful when trying to design a more general answer to this problem. See [this answer](https://stackoverflow.com/a/2411661/3791245) for a simple Epsilon implementation: `Math.Abs(x-y) <= epsilon`. Rounding to 3 places basically sets epsilon to 1E-3. You could probably choose a smaller epsilon. – Sean Skelly Oct 21 '20 at 23:06

1 Answers1

3

The short story

Don't compare floating point values for equality.

The first thing programmers usually do after making this remark, is to recommend using an Epsilon. However it's wrong. float.Epsilon is the smallest possible float greater than zero, however (and more importantly) it does not mean that it's the smallest difference between any two arbitrary floats. When comparing floats, the better practice is to choose a reasonable value to determine if two floats are "close enough" to be equal.

The problem with your code you are rounding the numbers first, which may include a significant differences that pops the round one way or the other.

The long story

From the ECMA Specs

9.3.7 Floating-point types

C# supports two floating-point types: float and double. The float and double types are represented using the 32-bit single-precision and 64-bit double-precision IEC 60559 formats...

...

Floating-point operations may be performed with higher precision than the result type of the operation. [Example: Some hardware architectures support an “extended” or “long double” floating-point type with greater range and precision than the double type, and implicitly perform all floating-point operations using this higher precision type.

...

Further more

The IEC 60559 standard doesn't specify exact values for the parameters of the extended floating-point format. It only specifies minimum values. The extended format used by Intel processors since the 8087 is 80 bits long, with 15 bits for the exponent and 64 bits for the significand. Unlike other formats, the extended format does leave room for the leading bit of the significand, enabling certain processor optimizations etc.

Example

enter image description here

The compilers

Now comes in the compilers. The compiler and jitter will try to optimise code at various points before it's executed on the CPU depending on how it is set to optimize (among other things).

Those optimisation will affect what is stored in memory, how, and what is set in registers and reorganise, as well as many other subtle instructions that may or may not happen (depending on a range of factors). Each one of these optimisation has the potential to non-deterministically change the result of floating point calculation. Which is your exact problem.

In Summary

When it comes to floating point arithmetic, don't ever rely on floating point equality. Additionally don't try and deterministically rely on results from the compiler, the jitter, your Cpu, the architecture, the platform, the release or the bitness of the application.

Compare floating point types within an expectable range (to you).

halfer
  • 19,824
  • 17
  • 99
  • 186
TheGeneral
  • 79,002
  • 9
  • 103
  • 141
  • 1
    That TLDR really needs to be emphasised when teaching someone to code as soon as you touch any arithmetic (i.e. bloody early on). The number of times I've been asked "why isn't my calculation working?" is too damn high. – NPras Oct 22 '20 at 00:53
  • I agree that floating point equality comparisons shouldn't be done _directly_ (with `==`), and also that rounding can be misleading. But I think the answer can be improved with an explicit code implementation of how you think the comparison _should_ be performed, if `==` shouldn't be used. Presumably something like `Math.Abs(x - y) < epsilon`, where users choose a reasonable value for `epsilon`. If OP thinks rounding to 3 places is enough, perhaps an epsilon of 1E-3 is good enough for OP's particular comparison. – Sean Skelly Oct 22 '20 at 19:32
  • @SeanSkelly thanks for the input, ill update it today – TheGeneral Oct 22 '20 at 21:19