Floating point operations ambiguity

Question

Possible Duplicate:
Why is floating point arithmetic in C# imprecise?

Why is there a bias in floating point ops? Any specific reason? Output: 160 139

static void Main()
        {
            float x = (float) 1.6;
            int y = (int)(x * 100);
            float a = (float) 1.4;
            int b = (int)(a * 100);
            Console.WriteLine(y);
            Console.WriteLine(b);
            Console.ReadKey();
        }

This also does not work for `1.3` but works for `1.2` and `1.1` — Nikhil Agrawal, Sep 15 '12 at 08:35
Read this http://stackoverflow.com/questions/1018231/problem-converting-from-int-to-float — Nikhil Agrawal, Sep 15 '12 at 08:37
[What Every Programmer Should Know About Floating-Point Arithmetic](http://floating-point-gui.de/) — nneonneo, Sep 15 '12 at 08:45
hi john! not precisely that case. in that question the focus was on storing floating numbers. however in present case [1.6(8/5) and 1.4(7/5)] there should not really be a difference. however that is not the case evidently. — Amber, Sep 15 '12 at 08:48

score 3 · Answer 1 · answered Sep 16 '12 at 12:20

Any rational number that has a denominator that is not a power of 2 will lead to an infinite number of digits when represented as a binary. Here you have 8/5 and 7/5. Therefore there is no exact binary representation as a floating-point number (unless you have infinite memory).

The exact binary representation of 1.6 is 110011001100110011001100110011001100...
The exact binary representation of 1.4 is 101100110011001100110011001100110011...
Both values have an infinite number of digits (1100 is repeated endlessly).

float values have a precision of 24 bits. So the binary representation of any value will be rounded to 24 bits. If you round the given values to 24 bits you get:
1.6: 110011001100110011001101 (decimal 13421773) - rounded up
1.4: 101100110011001100110011 (decimal 11744051) - rounded down

Both values have an exponent of 0 (the first bit is 2^0 = 1, the second is 2^-1 = 0.5 etc.).
Since the first bit in a 24 bit value is 2^23 you can calculate the exact decimal values by dividing the 24 bit values (13421773 and 11744051) by two 23 times.
The values are: 1.60000002384185791015625 and 1.39999997615814208984375.

When using floating-point types you always have to consider that their precision is finite. Values that can be written exact as decimal values might be rounded up or down when represented as binaries. Casting to int does not respect that because it truncates the given values. You should always use something like Math.Round.

If you really need an exact representation of rational numbers you need a completely different approach. Since rational numbers are fractions you can use integers to represent them. Here is an example of how you can achieve that.
However, you can not write Rational x = (Rational)1.6 then. You have to write something like Rational x = new Rational(8, 5) (or new Rational(16, 10) etc.).

mathematician1975 · Answer 2 · 2012-09-15T08:53:38.770

2

This is due to the fact that floating point arithmetic is not precise. When you set a to 1.4, internally it may not be exactly 1.4, just as close as can be made with machine precision. If it is fractionally less than 1.4, then multiplying by 100 and casting to integer will take only the integer portion which in this case would be 139. You will get far more technically precise answers but essentially this is what is happening.

In the case of your output for the 1.6 case, the floating point representation may actually be minutely larger than 1.6 and so when you multiply by 100, the total is slightly larger than 160 and so the integer cast gives you what you expect. The fact is that there is simply not enough precision available in a computer to store every real number exactly.

See this link for details of the conversion from floating point to integer types http://msdn.microsoft.com/en-us/library/aa691289%28v=vs.71%29.aspx - it has its own section.

edited Sep 15 '12 at 08:53

answered Sep 15 '12 at 08:36

mathematician1975

21,161
6
59
101

But integer casting takes the integer portion of a float does it not? – mathematician1975 Sep 15 '12 at 08:43
139.999999999 is more digits than can fit in a `float`. Consequently, it probably gets rounded up to 140 *in the compiler*. – nneonneo Sep 15 '12 at 08:44
I thought that rounding and casting were entirely different things. I am pretty sure that integer casting takes the integer part of a float and thus yields the same value as that obtained by a floor function – mathematician1975 Sep 15 '12 at 08:46
if 140 -> 139.9999 by the compiler, then why not the same treatment for 160 when both cases do not involve division leading to an infinite precision floating number(at least in mathematics). – Amber Sep 15 '12 at 09:00
@Amber No. Lets say that 1.4 is represented internally as 1.3999449 (just for example). Then multiply by 100 and we have result 139.99449. This is a floating point that you cast to int and the integer part of that is 139. Now consider 1.6 is stored as 1.6000001. Multiply by 100 and result is 160.00001, of which the integer part is 160 – mathematician1975 Sep 15 '12 at 09:04
yea that's exactly my point. why would a language treatment be as varying as you just mentioned? now that we've seen the case we know(or assume) how it is, but it surely defies logic. – Amber Sep 15 '12 at 09:23
@Amber But how else can it work? A computer has finite precision. It is nothing to do with the language, it is the nature of floating point arithmetic. The standard used to store floating point values may simply not be able to exactly represent your values and so internally it is stored as precisely as the machine allows. I think you need to read some literature on numerical analysis and floating point arithmetic. – mathematician1975 Sep 15 '12 at 09:53
that is the problem. no solution till date does not mean there will be never be any. it's not about representation, it has got to do more with limitations. – Amber Sep 15 '12 at 10:08
2

@Amber: It is about representation: If binary floating-point arithmetic is altered so that `x = 1.4;…(int) (x*100)` produces 140, then it must be that x is set to something slightly more than 1.4 (since exactly 1.4 cannot be represented). But then `x = 1.4;…(int) ((2-x)*100)` will produce 59, since the fact that x is slightly more than 1.4 means `2-x` is slightly less than .6. It is mathematically impossible to make binary floating-point arithmetic work as if it were decimal arithmetic. – Eric Postpischil Sep 16 '12 at 00:17

score 1 · Answer 3 · answered Sep 15 '12 at 15:29

The floating point types float (32 bit) and double (64 bit) have a limited precision and more over the value is represented as a binary value internally. Just as you cannot represent 1/7 precisely in a decimal system (~ 0.1428571428571428...), 1/10 cannot be represented precisely in a binary system.

You can however use the decimal type. It still has a limited (however high) precision, but the numbers a represented in a decimal way internally. Therefore a value like 1/10 is represented exactly like 0.1000000000000000000000000000 internally. 1/7 is still a problem for decimal. But at least you don't get a loss of precision by converting to binary and then back to decimal.

Consider using decimal.

Floating point operations ambiguity

3 Answers3