1

I define a floating point number as float transparency = 0.85f; And in the next line, I pass it to a function -- fcn_name(transparency) -- but it turns out that the variable transparency has value 0.850000002, and when I print it with the default setting, it is 0.850000002. For the value 0.65f, it is 0.649999998.

How can I avoid this issue? I know floating point is just an approximation, but if I define a float with just a few decimals, how can I make sure it is not changed?

jscs
  • 63,694
  • 13
  • 151
  • 195
user565739
  • 1,302
  • 4
  • 23
  • 46
  • You can't. You get the closest representable floating point number to the constant you put in your source. You can mitigate the effect by using `double`s instead of `float`s. – Daniel Fischer Oct 23 '12 at 21:35
  • Using a type with higher precision (if you need it!). You may even consider to do **not use [0..1] range but [0..100] range** so you will have a better approximation for numbers you're managing. You may divide by 100 at the end of all your calculations. – Adriano Repetti Oct 23 '12 at 21:35
  • 4
    Go read [What Ever Computer Scientist Should Know About Floating-Point Arithmetic](http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html). – Adam Rosenfield Oct 23 '12 at 21:46
  • If you are always working with 2 decimal places, you are free to convert your code to use integers (just scale everything by 100). – paddy Oct 23 '12 at 21:50
  • The key understanding this is that the mantissa and exponent of a floating point number are represented in binary. So there isn't always an exact representation for even simple floating point values. Changing to double will reduce but not elmininate this problem. You'll just get more 0's – Rafael Baptista Oct 23 '12 at 21:51
  • 3
    These values cannot be represented precisely in binary floating-point format regardless of how large your floating-point type is. Switching to `double` will reduce the error, but the error will still be there. You can concoct a 64-kilobyte floating-point type, and the error will still be there, simply because the representation of `0.65` in floating-point binary has *infinite* length. – AnT stands with Russia Oct 23 '12 at 22:17
  • Do you actually need the higher precision for your calculations, or do you just need a way to print the numbers with 2 decimal places of precision? – japreiss Oct 23 '12 at 23:42
  • The issue here is your expectation of what should happen. – harold Oct 24 '12 at 13:15

2 Answers2

3

Floating-point values represented in binary format do not have any specific decimal precision. Just because you read in some spec that the number can represent some fixed amount of decimal digits, it doesn't really mean much. It is just a rough conversion of the physical (and meaningful) binary precision to its much less meaningful decimal approximation.

One property of binary floating-point format is that it can only represent precisely (within the limits of its mantissa width) the numbers that can be expressed as finite sums of powers of 2 (including negative powers of 2). Numbers like 0.5, 0.25, 0.75 (decimal) will be represented precisely in binary floating-point format, since these numbers are either powers of 2 (2^-1, 2^-2) or sums thereof.

Meanwhile, such number as decimal 0.1 cannot be expressed by a finite sum of powers of 2. The representation of decimal 0.1 in floating-point binary has infinite length. This immediately means that 0.1 cannot be ever represented precisely in finite binary floating-point format. Note that 0.1 has only one decimal digit. However, this number is still not representable. This illustrates the fact that expressing floating-point precision in terms of decimal digits is not very useful.

Values like 0.85 and 0.65 from your example are also non-representable, which is why you see these values distorted after conversion to a finite binary floating-point format. Actually, you have to get used to the fact that most fractional decimal numbers you will encounter in everyday life will not be representable precisely in binary floating-point types, regardless of how large these floating-point types are.

AnT stands with Russia
  • 312,472
  • 42
  • 525
  • 765
0

The only way I can think of solving this problem is to pass characteristic and mantissa to the function separately and let IT work on setting the values appropriately.

Also if you want more precision,

http://www.drdobbs.com/cpp/fixed-point-arithmetic-types-for-c/184401992 is the article I know. Though this works for C++ only. (Searching for an equivalent C implementation).

I tried this on VS2010,

#include <stdio.h>
void printfloat(float f)
{
    printf("%f",f);
}

int main(int argc, char *argv[])
{
    float f = 0.24f;
    printfloat(f);
    return 0;
}

OUTPUT: 0.240000
Aniket Inge
  • 25,375
  • 5
  • 50
  • 78
  • This is not a floating point versus fixed point issue. It is a radix issue. For example, if you had a binary fixed-point format with 8 integer bits and 8 fraction bits, it would not be able to represent .85 exactly. – Eric Postpischil Oct 24 '12 at 13:05
  • @EricPostpischil which is why I asked him to send the characteristic and mantissa separately. for example: characteristic here would be 0 and mantissa would be 75 and let the program handle this if he wants accuracy. – Aniket Inge Oct 24 '12 at 13:07
  • Asking him to pass the characteristic and mantissa separately does not correct the problem that the first sentence in the answer is false. I suggest you edit the answer to remove the false statement “Alas, that’s the problem with floating point numbers, as opposed to fixed point numbers.” (Additionally, I do not see reason to use the code style for the phrases “floating point” and “fixed point”. They are not code. For emphasis, you can use **bold** or *italic*.) – Eric Postpischil Oct 24 '12 at 13:09