Problem converting from int to float

Question

There is a strange behavior I cannot understand. Agreed that float point number are approximations, so even operations that are obviously returning a number without decimal numbers can be approximated to something with decimals.

I'm doing this:

int num = (int)(195.95F * 100);

and since it's a floating point operation I get 19594 instead of 19595.. but this is kind of correct.

What puzzles me is that if I do

float flo = 195.95F * 100;
int num = (int) flo;

I get the correct result of 19595.

Any idea of why this happens?

out of curiosity what happens when you use Convert.ToInt32 ? — AvatarOfChronos, Jun 19 '09 at 14:27
Converting using the Convert.ToInt32 returns the correct result in both scenario — CodeClimber, Jun 19 '09 at 14:30
Related to the section [Differences Among IEEE 754 Implementations](http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html#3098) which appears on the linked webpage as an addition by an anonymous author to the text _What Every Computer Scientist Should Know About Floating-Point Arithmetic_. — Jeppe Stig Nielsen, Jul 25 '13 at 15:13

Marc Gravell · Accepted Answer · 2009-06-19T14:45:37.333

11

I looked to see if this was the compiler doing the math, but it behaves this way even if you force it out:

static void Main()
{
    int i = (int)(GetF() * GetI()); // 19594
    float f = GetF() * GetI();
    int j = (int)f; // 19595
}
[MethodImpl(MethodImplOptions.NoInlining)]
static int GetI() { return 100; }
[MethodImpl(MethodImplOptions.NoInlining)]
static float GetF() { return 195.95F; }

It looks like the difference is whether it stays in the registers (wider than normal r4) or is forced to a float variable:

L_0001: call float32 Program::GetF()
L_0006: call int32 Program::GetI()
L_000b: conv.r4 
L_000c: mul 
L_000d: conv.i4 
L_000e: stloc.0

vs

L_000f: call float32 Program::GetF()
L_0014: call int32 Program::GetI()
L_0019: conv.r4 
L_001a: mul 
L_001b: stloc.1 
L_001c: ldloc.1 
L_001d: conv.i4 
L_001e: stloc.2

The only difference is the stloc.1 / ldloc.1.

This is supported by the fact that if you do an optimised build (which will remove the local variable) I get the same answer (19594) for both.

edited Jun 19 '09 at 14:45

answered Jun 19 '09 at 14:39

Marc Gravell

1,026,079
266
2,566
2,900

Interesting. Do you think this might happen the same way in both x86 and 64-bit machines? Do the registers differ in width? – Erich Mirabal Jun 19 '09 at 14:46
Yes they do, but I couldn't predict what would happen on x64 I'm afraid... I would *guess* it'll behave pretty much the same, since the variable is still r4 and the registers are "at least" r4... – Marc Gravell Jun 19 '09 at 14:49
1

Note that what will *really* bake your noodle is that it doesn't matter so much what CPU you run on so much as what *hardware unit* the JIT compiler targets when given the option of keeping variables as nativefloats. the SEE hardware verses the x87 hardware have differing behaviours in this regard for example. First rule of FP - if exactness is exactly what you want, worry :) – ShuggyCoUk Jun 19 '09 at 14:54

score 2 · Answer 2 · answered Jun 19 '09 at 14:33

2

this code...

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            float result = 195.95F*100;
            int intresult = (int)(195.95F * 100);
        }
    }
}

give this IL

.method private hidebysig static void  Main(string[] args) cil managed
{
  .entrypoint
  // Code size       14 (0xe)
  .maxstack  1
  .locals init ([0] float32 result,
           [1] int32 intresult)
  IL_0000:  nop
  IL_0001:  ldc.r4     19595.
  IL_0006:  stloc.0
  IL_0007:  ldc.i4     0x4c8a
  IL_000c:  stloc.1
  IL_000d:  ret
} // end of method Program::Main

look at IL_00001 -> the compier has done the calc.. Otherwise there are the decimal -> binary conversion problem

answered Jun 19 '09 at 14:33

Andrea Balducci

2,834
1
21
18

it's only a base conversion issue due to different storage (r4 vs i4) – Andrea Balducci Jun 19 '09 at 14:37
1

The problem still happens if you take the decision away from the compiler - see my answer. – Marc Gravell Jun 19 '09 at 14:40
If I turn off optimization I've the same results 95/94. the diff is in the way the vals are stored (r4 vs i4) and the common problem of binary rapresentation of base10 decimal data in base2. – Andrea Balducci Jun 19 '09 at 15:30

score 2 · Answer 3 · answered Jun 19 '09 at 14:52

2

Mark's answer is correct in that it is the conversion between nativefloat and float32/float64.

This is covered in the CLR ECMA spec but David Notario explains this far better than I could.

answered Jun 19 '09 at 14:52

ShuggyCoUk

36,004
6
77
101

We also call this out in the C# specification: "Floating-point operations may be performed with higher precision than the result type of the operation. For example, some hardware architectures support an “extended” or “long double” floating-point type with greater range and precision than the double type, and implicitly perform all floating-point operations using this higher precision type. [...] C# allows a higher precision type to be used for all floating-point operations." – Eric Lippert Jun 19 '09 at 15:50
Ah yes, forgot about that one too - I guess it is one of those there's two compilers and they both get to have a say in this sort of thing. – ShuggyCoUk Jun 19 '09 at 15:58

score 1 · Answer 4 · answered Jun 19 '09 at 14:40

1

Try converting float to double in your second example:

double flo = 195.95F * 100;
int num = (int) flo;

I'm guessing in your first example the compiler is using double to hold the intermediate result, and so in the float case you're losing precision.

answered Jun 19 '09 at 14:40

Jim Ferrans

30,582
12
56
83

Your guess is on the right track. But it's not the compiler, it's the processor that makes this optimization. And it's not a double; it can actually be even higher precision than double on some processors. – Eric Lippert Jun 19 '09 at 15:53

score 0 · Answer 5 · answered Jun 19 '09 at 14:27

0

When you multiply by 100, that is an integer, so it is doing an implicit conversion at that step. If you put an "F" behind the 100, I'll bet they'd be the same.

I typically use boxing/unboxing with the parentheses when it is a reference type. When it is a value type, I try to use the Convert static methods.

Try Convert.ToSingle(YourNumber); for a more reliable conversion.

HTH

answered Jun 19 '09 at 14:27

Rap

6,851
3
50
88

Thank you, I know how convert... I was just wondering why assigning the variable and then casting, works differently from just casting – CodeClimber Jun 19 '09 at 14:32
Why would that make any difference? The problem is with the conversion to float and subsequent multiplication, and writing it out in more words won't change anything. – David Thornley Jun 19 '09 at 14:35

score 0 · Answer 6 · answered Jun 19 '09 at 14:31

0

I can't answer why the second one works and the first one doesn't. However, I can tell you that 195.95 is a non-terminating decimal in binary, and as such round off errors like this one are bound to happen.

Try converting to a double rather than float. You could also use a money or a decimal type rather than a float. That will store the number differently and more accurately.

For more on floating point numbers and the IEEE representation, go here:

http://en.wikipedia.org/wiki/IEEE_754

answered Jun 19 '09 at 14:31

ristonj

1,590
1
12
15

A double might work by accident, but given an imprecise number there's no reason why a more accurate representation would be more or less than the real value. We don't care in this case whether it's below the correct value by a float epsilon or a double epsilon, just that it's below. – David Thornley Jun 19 '09 at 14:33
I receive the value as float, have to multiply by 100 and then store it as Integer. I'm not converting anything to a float in my code. – CodeClimber Jun 19 '09 at 14:34

Problem converting from int to float

6 Answers6

Linked