5

------ Please jump to the last update -----------

I have found a bug (in my code) and I am struggling to find correct understanding of it.

It all boils down to this specific example taken from the immediate window while debugging:

x
0.00023569075
dx
-0.000235702712
x+dx+1f < 1f
true

(float) (x+dx+1f) < 1f
false

x and dx are both of type float. So why is the boolean value different when I do a cast?

In the actual code i had:

x+=dx
if( x+1f < 1f) // Add a one to truncate really small negative values (originally testing x < 0)
{
    // do actions accordingly

    // Later doing
    x+=1; // Where x<1 has to be true, therefore we have to get rid of really small negatives where x+=1 will give x==1 as true and x<1 false.
}

but I am now trying with the cast

x+=dx;
if( (float)( x+1f) < 1f) // Add a one to truncate really small negative values (originally testing x < 0)
{
    // do actions accordingly
    // Later doing
    x+=1; // Where x<1 has to be true, therefore we have to get rid of really small negatives where x+=1 will give x==1 as true and x<1 false.
}

Visual studio says that the cast is redundant but it DO get a false positve without it as de immediate window also told me when:

x+dx+1f < 1f
true

I am currently running my code to see if I get the bug again with the cast and I will update as soon as I get convinced either way.

In the meanwhile I hope someone can sort out whats going on here? Can I expect the Cast to do something?

Update - Variables My variables x and dx are components of a Vector2 (Xna/monogame). So in the code you should read

Vector2 coord; // the x (and y) components are floats.
Vector2 ds;
coord.X // where it says x 
ds.X    // where it says dx

I thought this would not matter, but maybe it does.

Update 2 - Drop the above example

Seeing that the cast did change the outcome I made this simple demonstration

class Program
{
    static void Main(string[] args)
    {
        float a = -2.98023224E-08f; // Just a small negative number i picked...

        Console.WriteLine(((a + 1f) < 1f) ? "true" : "false");          //true

        Console.WriteLine(((float)(a + 1f) < 1f) ? "true":"false");     //false

        // Visual Studio Community 2015 marks the above cast as redundant
        // but its clearly something fishy going on!

        Console.Read();
    }
}

So, why does this cast change the result when even VS says it is redundant?

Nick Sick
  • 139
  • 2
  • 6
  • 1
    "_I have found a bug_" - this should [**never** be your first assumption](http://www.icodeguru.com/dotnet/debugging-strategies-for-.net-developers/8217final/LiB0010.html). – Soner Gönül Jul 22 '17 at 13:48
  • 1
    I found a bug since my code does not work as I expected. Maybe the bug is in my brain :) – Nick Sick Jul 22 '17 at 14:15

3 Answers3

3

I don't see how you're declaring your variables, but assigning the static values that posted to variables makes those variables of double type, not float. And as you know, the double type has larger precision than float.

Here is a test:

var x = 0.00023569075;
var dx = -0.000235702712;
Console.WriteLine(x.GetType()); //result: System.Double
Console.WriteLine(dx.GetType()); //result: System.Double

And of course, when adding two doubles and a float, the result is double, so that's why the first condition returns true:

Console.WriteLine(x+dx+1f < 1f); //returns true
Console.WriteLine(x+dx+1f); //returns 0.999999988038

But when you cast it to float, a truncation occurs and the result is no longer correct, which is why your second condition returns false:

Console.WriteLine((float)(x+dx+1f) < 1f); //returns false
Console.WriteLine((float)(x+dx+1f)); //returns 1

UPDATE: When your variables are float, truncation is at play here. Remember that the max precision of float is only 7 digits and you're assigning numbers with much more digits, so truncation occurs and results in the inaccurate results that you're witnessing.

In your original question, here is how the values are truncated:

float x = 0.00023569075f;
float dx = -0.000235702712f;
Console.WriteLine(x); //0.0002356907 last digit lost
Console.WriteLine(dx); //-0.0002357027 last two digits lost
Console.WriteLine((x + dx)); //-1.196167E-08
Console.WriteLine((x + dx + 1f)); //1

The reason why the last result is 1 should be obvious. The result of adding x and dx is -1.196167E-08 (-0.00000001196167) which has 7 digits and can fit in float. Now adding 1 makes it 0.99999998803833 which has 14 digits and cannot fit in float so it is truncated and rounded to 1 when stored in a float.

The same thing happens in your update 2. The value -2.98023224E-08f has 9 digits, so it is truncated to -2.980232E-08 (-0.00000002980232). Again, adding 1 to that makes it 0.99999997019768 which is truncated and rounded to 1:

float a = -2.98023224E-08f;
Console.WriteLine(a); //-2.980232E-08 last two digits lost
Console.WriteLine(a + 1f); //1

UPDATE 2 Chris commented about the calculation being done at a higher precision which is absolutely correct, but that doesn't explain the results which should not be affected by that. Yes a + 1f calculation is done at a higher precision, but because both operands are float, the result of the calculation is then automatically casted down to float. Manually casting the result to float should be then redundant and shouldn't change the result. More importantly, it does not force the calculation to be done at float precision. Yes, we still get these results:

Console.WriteLine(a + 1f); //1
Console.WriteLine(a + 1f < 1f); //True
Console.WriteLine((float)(a + 1f) < 1f); //False

Thanks to a good debate with Chris and lots of testing on various machines, I think I have a better understanding of what's going on.

When we read:

Floating-point operations may be performed with higher precision than the result type of the operation

The word operations here is not only the calculations (addition, in our example), but also the comparisons (less than, in our example). So in the second line above, the entire a + 1f < 1f is done at a higher precision: Adding the value -2.98023224E-08f (-0.0000000298023224) to 1 results in 0.9999999701976776 which is then compared to 1f and obviously return true:

Console.WriteLine(a + 1f < 1f); //True

At no time there is any casting to float, because the result of the comparison is bool.

In the first line however, we're simply printing the result of the calculation a+1f, and because both operands are float, the result is automatically casted down to float and that causes it to be truncated and rounded to 1:

Console.WriteLine(a + 1f); //1

Now the big question is about the third line. What's different this time is that the cast is forcing the result of the calculation to be casted down to float, which truncates and rounds it to 1 and then this is compared to 1f. The comparison is still done at a higher precision, but now it doesn't matter because the casting has already changed the result of the calculation:

Console.WriteLine((float)(a + 1f) < 1f); //False

So the casting here is causing the two operations (addition and comparison) to be done separately. Without casting, the steps are: add, compare, print. With casting, the steps are: add, cast, compare, print. Both operations are still done at a higher precision, because casting cannot affect that.

Perhaps Visual Studio is saying that the casting is redundant because it is not taking into account whether the operations will be done at a higher precision or not.

Racil Hilan
  • 24,690
  • 13
  • 50
  • 55
  • My variables are not set by static values. I changed their names it is actually: Vector2 Coord.X where i just put in x. Vector2 is xna/monogame struct. But afaik Coord.X is a float. – Nick Sick Jul 22 '17 at 13:58
  • OK, we need to see your code for `Vector2` and `Coord.X` to judge, but the point is the result of your calculation is `double`, not `float`. Check the type of the result like this `(Vector2+Coord.X+1f).GetType()` and see what you get. – Racil Hilan Jul 22 '17 at 14:04
  • But as I said. Visual Studio thinks my (float) cast is redundant. They are all floats! `(coord.X+ds.X+1f).GetType()` says `single`. – Nick Sick Jul 22 '17 at 14:27
  • Yes. Truncation is the effect I was looking for. That is why I add `1f`. What I dont understand is why `a + 1f` is not truncated **before** comparing for inequality in `(a + 1f < 1f)` and why a `(float)` cast as `(float)(a + 1f) < 1f` changes the result. Somehow it keeps a higher precision (double?) without the cast. The explantion as "_It's not really changing the result, it's just the inaccuracy of float point numbers at play._" does not feel fully satisfactory. – Nick Sick Jul 22 '17 at 18:22
  • Read my update again more carefully. The result of the addition is truncated to `0.9999999`, and `(float)` does not change the value, but just treats it as `1`. In the same way printing the result `Console.WriteLine(x + dx)` doesn't change the value, but still prints `1`. – Racil Hilan Jul 22 '17 at 18:26
  • 1
    Thinking about whether the calc is single or double made me think of this: https://stackoverflow.com/questions/2342396/why-does-this-floating-point-calculation-give-different-results-on-different-mac - in particular that floating point calculations can be done at higher precision than you expect. No idea if this could explain it but figured it may be relevant to the conversation. – Chris Jul 22 '17 at 18:29
  • @Chris, interesting reading. I really think there is something relevant to it. @Racil, How do you know that the addition gets truncated too `0.999999` exactly? Putting `(float) 0.999999` will not truncated to `1`. If the result after the addition fits in a single it would not get truncated because of a cast. That would just be stupid (IMHO) – Nick Sick Jul 22 '17 at 18:47
  • @Chris Not exactly, but related. Yes, the addition can be done at a higher precision and then the result will be truncated to 7 digits to fit in `float`, but all that happens before the casting `(float)`. To proof that it has nothing to do with the higher precision calculation, try this `float z=0.99999999f`. Print it and you still get `1`, but this time the casting will not treat it as `1` and will return `false`. So it's not only the calculation happens at a higher precision, but the storage also treats it differently. – Racil Hilan Jul 22 '17 at 18:51
  • I think it probably is actually relevant. If you change `Console.WriteLine(((float)(a + 1f) < 1f));` (which gives false) to `Console.WriteLine(((double)(a + 1f) < 1f)); ` then it prints `true` which suggests to me that the intermediate calculation is stored as a double (or higher) precision number. – Chris Jul 22 '17 at 18:52
  • @Chris yes, I said **related**, and **related** and **relevant** are almost equal when we compare them at lower linguistic precision :-) – Racil Hilan Jul 22 '17 at 18:54
  • @RacilHilan: I wasn't disagreeing with you, just adding exactly why I thought it was relevant. – Chris Jul 22 '17 at 18:55
  • @Chris Yes I know, I was just adding some humor :-) – Racil Hilan Jul 22 '17 at 18:57
  • Ah. Should have known from the smiley! ;-) – Chris Jul 22 '17 at 18:57
  • @NickSick How do I know? Just read the docs. The precision of `float` is 7 digits. Also in my update, I printed them out and you can see how they lose all digits beyond 7. That's how I know :-). Now to answer your point about `0.9999999`, that has only 7 digits and will not be truncated or treated as `1`. The value needs to have more digits for that to happen. That's just how floating point numbers behave. – Racil Hilan Jul 22 '17 at 19:00
  • @Racil, _"How do I know? Just read the docs. ..."_ Well, what I meant was How do you now it is _intermediately_ stored like 0.9999999 and not (for instance) 0.99999998... (which of course is at higher precision than a float can have)? And if it actually was stored as 0.9999999 then it would not change with a cast, since as you otherwhere stated: A float is a float with exactly one representation. – Nick Sick Jul 24 '17 at 11:59
  • I don't know what you mean by *intermediately*. If you mean *internally*, then the answer is yes, it is stored internally with precision of 9. This is why when assigning it a value of `x=0.9999999f` or `x=0.99999996f`, the results will be different. Printing them with `Console.WriteLine(x)` will result in `0.9999999` for both, but printing them with `Console.WriteLine(x.ToString("G9"))` (to get all internal digits) results in `0.9999999` and `0.99999994`. Notice the last digit `6` is changed to `4` depending on the machine. – Racil Hilan Jul 24 '17 at 15:16
  • 1
    Thanks to a good debate with Chris and lots of testing on various machines, I think I have a better understanding of what's going on. See my update 2. – Racil Hilan Jul 24 '17 at 23:43
  • @Racil, _"The word operations here is not only the calculations (addition, in our example), but also the comparisons (less than, in our example). So in the second line above, the entire a + 1f < 1f is done at a higher precision"_ I like this part of your update 2 because even if that was already in my line of thinking it is good to put it down in words. – Nick Sick Jul 25 '17 at 00:38
2

I think the important part of the c# spec here is this:

"Floating-point operations may be performed with higher precision than the result type of the operation. For example, some hardware architectures support an "extended" or "long double" floating-point type with greater range and precision than the double type, and implicitly perform all floating-point operations using this higher precision type. Only at excessive cost in performance can such hardware architectures be made to perform floating-point operations with less precision, and rather than require an implementation to forfeit both performance and precision, C# allows a higher precision type to be used for all floating-point operations. " - https://msdn.microsoft.com/en-us/library/aa691146(v=vs.71).aspx

We can infer that this is almost certainly what is happening by looking at these three lines of code, doing the comparison in slightly different ways:

float a = -2.98023224E-08f;
Console.WriteLine((a + 1f) < 1f); // True
Console.WriteLine((float)(a + 1f) < 1f); //False
Console.WriteLine((double)(a + 1f) < 1f); //True

As you can see, the results of the first calculation (which is what we are wondering about) is the same as if the intermediate value is cast as a double telling us that the compiler is taking advantage of the option of performing the calculations at a higher precision.

Of course the reason the results are different is because although we can see that the result should be true when it calculates a+1f the result as a single is 1, hence the comparison being false.

And just to round this off a in the above is stored in a float with an exponent of -25 and a fraction of 0. If you add 1 to this then the -25 exponent parts are too small to be represented so it needs to round and in this case the rounding leaves the number at 1. This is because the way single precision floating point numbers are stored they only have 23 bits for the part following the leading 1 and therefore doesn't have the precision to store the fraction and it ends up rounding it to exactly 1 when stored. Hence why the comparison returns false when we force it to use float calculations all the way.

Chris
  • 27,210
  • 6
  • 71
  • 92
  • Even though I'm not 100% sure about this, I will choose this as the accepted answer. At least until someone convinces me otherwise. – Nick Sick Jul 23 '17 at 09:43
  • @RacilHilan: The last paragraph was explaining why the exact result of `a+1f` would not fit into a float. The earlier part was explaining that the calculation was done at a higher precision (eg at double precision or better). Your comment "why the 0.9999999 value is sometimes treated as 1" suggests some confusion - a number in .NET is never "sometimes treated" as one thing and other times another as you imply. A number is an exact sequence of bits that have meaning in the framework. Once it is calculated it is always treated as *exactly* that sequence of bits. – Chris Jul 24 '17 at 08:47
  • In this case `a+1f` as a float is *exactly* 1 because as I attempted to explain in the last paragraph the float cannot hold the precision necessary to include the very small difference that it should actually have from 1. "Secondly, casting the result to float, does not in any way force the calculation to float." - it does in that the whole expression `a+1f < 1f` calculates the left hand side as a double, then compares it to the right hand side as a double because as I said it is allowed to calculate things as a higher precision. – Chris Jul 24 '17 at 08:50
  • If you have the cast to float in there then it forces the intermediate value to be a float and thus be truncated to exactly 1 before the comparison takes place, hence it calculates `1f < 1f` and produces false. – Chris Jul 24 '17 at 08:51
  • 2
    Lastly I didn't quote "Other than delivering more precise results, this rarely has any measurable effects." because it isn't relevant. The bit you have quoted explicitly says that *sometimes* it has a measurable effect and since we are seeing a measurable effect it seemed pointless to quote it. I hope this has explained things to you a bit more clearly. I am happy to try to explain anything further. I'd certainly advise going and look at the bit representation of the result of `(float)a+1f` so you can understand that the answer is 1 and not 0.99999997019768... – Chris Jul 24 '17 at 08:54
  • Also I have realised you were right that I was getting my binary representation wrong. It should have been a lot of 1s rather than a lot of 0s. Have updated to talk more about precision and less about the exact floating point representation. – Chris Jul 24 '17 at 10:02
  • @Chris, @Racil, I must thank you guys for your dedication to provide me (and the community, of course) with a correct answer to this question. You are both well repped but as the OP I have to try to choose the best answer and when doing that my 2 cents are: In all cases where we store the result in a float `z=a+1f` or cast `(float) (a+1f)` the results are the same and equal to 1. The only case where we did not force the calculation to a float before the comparison `a+1f < 1f` _(true)_ , we got indication of a _higher precision_. – Nick Sick Jul 24 '17 at 11:31
  • So guys, until proven otherwise, if it smells like a fish, looks like a fish and swims like a fish it probably is a fish. – Nick Sick Jul 24 '17 at 11:31
  • `((a+1f)<1f)` is being treated as `((double)(a+1f)<1f)` because of the quoted text - the compiler is allowed to do things at a higher precision if it wants to. Printing `(a+1f)` and `(float)(a+1f)` will print 1 because the result of that is 1. a+1f can't fit into a float so it is rounded to exactly 1 and that is how it is stored and used from then on. Your multiplying it by itself again works because it does the calculation at higher precision before converting it back to a float. This time the result is slightly smaller and so it gets rounded to a number slightly less than 1. (cont.) – Chris Jul 24 '17 at 15:35
  • 1
    If you try instead `((float)(a+1f))*((float)(a+1f))` to force it to use floats for all intermediate calculations then you will find the result is 1. Because you are now just doing 1*1. Alternatively do `float b = a+1f; Console.WriteLine(b*b);` If you do the latter then you will find no calculations that use b and treat it as anything other than 1. It only gets treated as something less than 1 if you are doing calculations at a higher precision than float. – Chris Jul 24 '17 at 15:38
  • @RacilHilan: Are you sure about that? If I run `float a = -2.98023224E-08f; float z=a+1f; Console.WriteLine(z< 1f);` then I get "False" printed which is what I would expect. – Chris Jul 24 '17 at 15:57
  • I am doing my tests primarily using Linqpad on a windows PC with Visual Studio 2015. I've also tested my most recent comments code by sticking it in a program and compiling with csc version 4.7.2053.0. Its worth noting that it just says it *may* do the calculations at higher resolution. I couldn't find details on how dotnetfiddle works but I guess it just choose not to take advantage of that option. – Chris Jul 24 '17 at 16:26
  • So after some looking at rextester.com I think the big difference is in compiler optimisations. Because we are putting in static values the compiler can actually do a lot of the maths at compile time and not bother actually having the `z` or `a` variables exist at all. Here's a test program I wrote: http://rextester.com/EERW16443 . The two methods Test1 and Test2 are identical except that Test1 has this `Console.WriteLine(a.ToString("R"));`. And this one line added changes the results to what we would expect (z==1 is the only test that returns true). – Chris Jul 24 '17 at 16:37
  • I am a lot more unclear on the reasons for this but my best guess is that it actually just compiles the method with unexpected results such that `Console.WriteLine(z < 1f);` in the compiled code is just `Console.WriteLine(true);`. In this case it is the compiler precomputing things at higher precision. As I say its hard to be sure of this because I can't actualyl analyze the compiled code on their system to see what it is doing. – Chris Jul 24 '17 at 16:39
  • Oh, and forgot to note in the last comment that I think the `Console.WriteLine(a.ToString("R"));` changes things because `a.ToString()` requires `a` to exist so it can't be optimised away and so the optimizations don't happen... – Chris Jul 24 '17 at 16:46
  • 1
    Thank you Chris, our valuable debate and the tests I've done made me understand it better. You can see my update 2, if you're curious. – Racil Hilan Jul 24 '17 at 23:56
1

Because floats are stored in BINARY, IEEE floating point standard represents numbers as a binary mantissa and a binary exponent. (powers of 2). many decimal numbers cannot be represented exactly in this representation. so the compiler uses the nearest available binary IEEE floating point number that is available.

So since it is not exactly correct, no matter how small the difference actually is, the comparison fails. calculate the difference and you will see how small it is.

var diff = (float)(x+dx+1f) - 1f;

If you use decimals, it would probably work.

Charles Bretana
  • 143,358
  • 22
  • 150
  • 216
  • Well I do know about the nature of floating points but I do not think that is an answer to my question. I have now updated my question, both title and with simpler demonstration snippet. – Nick Sick Jul 22 '17 at 15:39
  • Well, as another answer above noted, it is not clear what the type of `x` and `dx` are declared as, since you did not include those declaration statements in your code, but `0.000235690` and `-0.000235702712` are both literals, of type decimal, not float, so the value of any variable stored as a float or double is almost guaranteed to not be equal to those literal values. That's the more important point here. – Charles Bretana Jul 23 '17 at 12:54
  • bottom line is you should never use floating point numbers when you are counting things. Use integers for that. Floating point numbers are for *measuring* things. And when measuring things, it is not meaningful to attempt to determine whether one value is *exactly* the same as another. The only thing that matters is *how much they differ* . – Charles Bretana Jul 23 '17 at 13:01
  • _"it is not clear what the type of x and dx"_, Well I clearly said "x and dx are both of type float" just under the code example. Since I now updated my question, the **Update 2** example is more easy to talk about. Did you have answer to why `a+1f<1f` _(true)_ differs from `(float)(a+1f)<1f` _(false)_ ? ( `a` is a `float` :) – Nick Sick Jul 24 '17 at 15:25
  • no, bottom line. Comparisons that involve expectations of equality should NEVER be performed with floating point numbers. Conversion from decimal literals to binary floats, or from binary floats to binary doubles, almost always involve approximations that make any expectation of equality comparisons just not even wrong, they are foolish. – Charles Bretana Aug 03 '17 at 23:50