Floating point Arithmetics

Question

Today in my C++ programming lessons, my proff told me that one should never compare two floating point values directly.

So I tried this piece of code and found out the reason for his statement.

double l_Value=94.9;
print("%.20lf",l_Value);

And I found the results as 94.89999999 ( some relative error )

I understand that floating numbers are not stored in the way one presents it to the code. Squeezing those ones and zeros in binary form involves some relative rounding errors.

Iam looking for solutions to two problems. 1. Efficient way to compare two floating values. 2. How to add a floating value to another one. Example. Add 0.1111 to 94.4345 to get the exact value as 94.5456

Thanks in advance.

Read [this SO article](http://stackoverflow.com/questions/588004/is-floating-point-math-broken). — Jabberwocky, Apr 10 '17 at 16:05
"one should never compare two floating point values directly." If your professor said that, get a new professor. What he should have said is never check two floating point values for **equality** directly. There's no issue with comparing them to see if one is greater than or less than the other. — JeremyP, Apr 10 '17 at 16:11
You've gotten two answers so far that suggest using "nearly equal" instead of "equal". The problem with that is that if a is "nearly equal" to b and b is "nearly equal" to c, it doesn't follow that a is "nearly equal" to c. This will bite you. There is no simple solution; people get PhD's in numerical computation, i.e., figuring out how to get reasonable results from inherently messy techniques. The first thing to do is abandon your intuition; floating-point math is **not** like real numbers, so you can't transfer your experience in real-world arithmetic to floating-point math. — Pete Becker, Apr 10 '17 at 16:20
"my proff told me that one should never compare two floating point values directly." --> A better prof would explain why to avoid this and also when, in select cases, it is OK. — chux - Reinstate Monica, Apr 10 '17 at 16:50
@Chux thanks for the answers people. JeremyP's comment, "if your value is an integer or fractional part has a denominator that is a power of 2" answered every question I had. — AstroMax, May 06 '17 at 02:28

chux - Reinstate Monica · Accepted Answer · 2017-04-10T22:18:38.110

Efficient way to compare two floating values.

A simple double a,b; if (a == b) is an efficient way to compare two floating values. Yet as OP noticed, this may not meet the overall coding goal. Better ways depend on the context of the compare, something not supplied by OP. See far below.

How to add a floating value to another one. Example. Add 0.1111 to 94.4345 to get the exact value as 94.5456

Floating values as source code have effective unlimited range and precision such as 1.23456789012345678901234567890e1234567. Conversion of this text to a double is limited typically to one of 2⁶⁴ different values. The closest is selected, but that may not be an exact match.

Neither 0.1111, 94.4345, 94.5456 can be representably exactly as a typical double.

OP has choices:

1.) Use another type other than double, float. Various libraries offer decimal floating point types.

2) Limit code to rare platforms that support double to a base 10 form such that FLT_RADIX == 10.

3) Write your own code to handle user input like "0.1111" into a structure/string and perform the needed operations.

4) Treat user input as strings and the convert to some integer type, again with supported routines to read/compute/and write.

5) Accept that floating point operations are not mathematically exact and handle round-off error.

double a = 0.1111;
printf("a:   %.*e\n", DBL_DECIMAL_DIG -1 , a);
double b = 94.4345;
printf("b:   %.*e\n", DBL_DECIMAL_DIG -1 , b);
double sum = a + b;
printf("sum: %.*e\n", DBL_DECIMAL_DIG -1 , sum);
printf("%.4f\n", sum);

Output

a:   1.1110000000000000e-01
b:   9.4434500000000000e+01
sum: 9.4545599999999993e+01
94.5456  // Desired textual output based on a rounded `sum` to the nearest 0.0001

More on #1

If an exact compare is not sought but some sort of "are the two values close enough?", a definition of "close enough" is needed - of which there are many.

The following "close enough" compares the distance by examining the ULP of the two numbers. It is a linear difference when the values are in the same power-of-two and becomes logarithmic other wise. Of course, change of sign is an issue.

float example:
Consider all finite float ordered from most negative to most positive. The following, somewhat-portable code, returns an integer for each float with that same order.

uint32_t sequence_f(float x) {
  union {
    float f;
    uint32_t u32;
  } u;
  assert(sizeof(float) == sizeof(uint32_t));
  u.f = x;
  if (u.u32 & 0x80000000) {
    u.u32 ^= 0x80000000;
    return 0x80000000 - u.u32;
  }
  return u.u3
}

Now, to determine if two float are "close enough", simple compare two integers.

static bool close_enough(float x, float y, uint32_t ULP_delta) {
  uint32_t ullx = sequence_f(x);
  uint32_t ully = sequence_f(y);
  if (ullx > ully) return (ullx - ully) <= ULP_delta;
  return (ully - ullx) <= ULP_delta;
}

score 0 · Answer 2 · answered Apr 10 '17 at 16:04

0

The way I've usually done this is is to have a custom equality comparison function. The basic idea, is you have a certain tolerance, say 0.0001 or something. Then you subtract your two numbers and take their absolute value, and if it is less than your tolerance you treat it as equal. There are other strategies that may be more appropriate for certain situations, of course.

answered Apr 10 '17 at 16:04

sovemp

1,402
1
13
31

1

The problem with checking against a threshold is that the choice of the epsilon is always arbitrary. Ask yourself if you have a good reason that you need to check for equality first. – nucleon Apr 10 '17 at 16:07

score 0 · Answer 3 · answered Apr 10 '17 at 16:05

0

Define for yourself a tolerance level e (for example, e=.0001) and check if abs(a-b) <= e
You aren't going to get an "exact" value with floating point. Ever. If you know in advance that you are using four decimals, and you want "exact", then you need to internally treat your numbers as integers and only display them as decimals. 944345 + 1111 = 945456

answered Apr 10 '17 at 16:05

hymie

1,982
1
13
18

2

"You aren't going to get an "exact" value with floating point. Ever." Yes you are, if your value is an integer or the fractional part has a denominator that is a power of 2. – JeremyP Apr 10 '17 at 16:13
That's a very very small subset of the range of floating-point numbers. But if you want to depend on that, go right ahead. – hymie Apr 10 '17 at 19:29

Floating point Arithmetics

3 Answers3