1

I have read multiple articles regarding floating point variables comparison, but failed to understand and get the required knowledge from those articles. So, here I am posting this question.

What is the good way to compare two float variables? Below is the code snippet:

#define EPSILON_VALUE 0.0000000000000001

bool cmpf(float A, float B)
{
  return (fabs(A - B) < EPSILON_VALUE);
}

int main()
{
  float a = 1.012345679, b = 1.012345678;

  if(cmpf(a, b))cout<<"same"<<endl;
  else cout<<"different"<<endl;

  return 0;
}

The output: same although both the float variables hold different values.

pkthapa
  • 1,029
  • 1
  • 17
  • 27
  • all your code is only to make sure two floats that are not exactly the same are considered the same, hence I dont understand what else did you expect... What is the question? The one you asked you already answered yourself – 463035818_is_not_an_ai May 08 '19 at 11:27
  • 1
    @user463035818 as you can see that both float variables (a & b) hold different values, but the output printed by the code is 'same' instead of 'different'. – pkthapa May 08 '19 at 11:30
  • well, thats the whole purpose of that `cmpf` function. maybe your misunderstanding is this: When do you compare floats that are exactly the same you will get `true` from comparing them via `==`, it just almost never happens that two floating point values resulting from calcuations are exactly the same. Then often you still need to have a way to say "ok they are not exactly the same, but within some acceptable error I can consider them to be the same" – 463035818_is_not_an_ai May 08 '19 at 11:32
  • 1
    It is wholly dependent on context. Sometimes comparing with an additive tolerance is appropriate, sometimes a multiplicative one is better. Sometimes `==` will do the job. You need to study your algorithm and how the errors, if any, will accumulate. The linked duplicate is a good starting point. – Bathsheba May 08 '19 at 11:33
  • @user463035818: Another myth I'm afraid especially under IEEE754. – Bathsheba May 08 '19 at 11:36
  • You wrote a function that returns `true`, and then you are asking why it returns `true` instead of `false`. – L. F. May 08 '19 at 11:36
  • 1
    @user463035818: `1 - 1.0 / 3 - 1.0 / 3 - 1.0 / 3` is perhaps a better one. – Bathsheba May 08 '19 at 11:37
  • @L.F. Where did you see the function always returns true? – pkthapa May 08 '19 at 11:41
  • @PankajKumarThapa Nowhere did I see the function always returns true ;-) I mean you wrote a function `cmpf` so that `cmpf(a, b)` returns `true`, and then you are asking why it returns `true` instead of `false`. – L. F. May 08 '19 at 11:43
  • @L.F. if you see the float variables values, you see different values. With these information, 'different' should be printed and not 'same'. – pkthapa May 08 '19 at 11:47
  • @PankajKumarThapa In case you still don't understand, you wrote a function `cmpf` so that `cmpf(a, b)` returns `true` when `a` are `b` are close but different values, and then you are asking why it returns `true` instead of `false`. What's the next point I need to clarify? ;-) – L. F. May 08 '19 at 11:48
  • @L.F. okay. I got your point. My question is how can I get 'different' with the same input values (a = 1.012345679, b = 1.012345678)? – pkthapa May 08 '19 at 11:50
  • @PankajKumarThapa But you just said that `a` and `b` are different values yourself in your last comment ;-) – L. F. May 08 '19 at 11:51
  • @L.F. if I give a = 1.01234567, b = 1.01234564, then I get different. Please check there are 8 digits after decimal. If I increase one more digit after decimal, then the whole comparison goes for a toss. – pkthapa May 08 '19 at 11:51
  • @L.F. don't get confused. by same I meant the same input data with different values in the variables. :D – pkthapa May 08 '19 at 11:53
  • @πάνταῥεῖ: This question is not a duplicate of [that question](https://stackoverflow.com/questions/588004/is-floating-point-math-broken). Please do not promiscuously mark questions as duplicates. – Eric Postpischil May 08 '19 at 11:53
  • You use float type. It only records 7 digits. So a and b are the same. You can see it if you `cout << a-b`. You can see more difference if you define a b and cmpf with var as double. – Arno Bozo May 08 '19 at 12:14
  • The most important thing to keep in mind when you're thinking about floating-point operations is that floating-point numbers **are not like real numbers**, and all of the intuition about decimal numbers that you've developed over your lifetime will mislead you. – Pete Becker May 08 '19 at 12:18
  • 1
    FWIW, the exact values `1.012345679` and `1.012345678`, when stored as floats, are both stored as the approximated value `1.01234567165374755859375`, so **when stored as floats, they are the exact same values**. They **should** return `"same"`, even without the `EPSILON_VALUE`. Note that your epsilon is below the precision a float can give anyway. Also, when you subtract nearly equal values, you should expect [*catastrophic cancellation*](https://en.wikipedia.org/wiki/Loss_of_significance), which can be a problem too. – Rudy Velthuis May 08 '19 at 19:49

3 Answers3

7

There is no general solution for comparing floating-point numbers that contain errors from previous operations. The code that must be used is application-specific. So, to get a proper answer, you must describe your situation more specifically.

The underlying problem is that performing a correct computation using incorrect data is in general impossible. If you want to compute some function of two exact mathematical values x and y but the only data you have is some inexactly computed values x and y, it is generally impossible to compute the exactly correct result. For example, suppose you want to know what the sum, x+y, is, but you only know x is 3 and y is 4, but you do not know what the true, exact x and y are. Then you cannot compute x+y.

If you know that x and y are approximately x and y, then you can compute an approximation of x+y by adding x and y. The works when the function being computed (+ in this example) has a reasonable derivative: Slightly changing the inputs of a function with a reasonable derivative slightly changes its outputs. This fails when the function you want to compute has a discontinuity or a large derivative. For example, if you want to compute the square root of x (in the real domain) using an approximation x but x might be negative due to previous rounding errors, then computing sqrt(x) may produce an exception. Similarly, comparing for inequality or order is a discontinuous function: A slight change in inputs can change the answer completely (from false to true or vice-versa).

The common bad advice is to compare with a “tolerance”. This method trades false negatives (incorrect rejections of numbers that would satisfy the comparison if the exact mathematical values were compared) for false positives (incorrect acceptance of numbers that would not satisfy the comparison).

Whether or not an applicable can tolerate false acceptance depends on the application. Therefore, there is no general solution.

The level of tolerance to set, and even the nature by which it is calculated, depend on the data, the errors, and the previous calculations. So, even when it is acceptable to compare with a tolerance, the amount of tolerance to use and how to calculate it depend on the application. There is no general solution.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • The most precise thing is to not use floating numbers at all. Use integers and fractions. – Michael Chourdakis May 08 '19 at 12:22
  • 1
    @MichaelChourdakis: How do you compute a square root or a cosine with integers or fractions? – Eric Postpischil May 08 '19 at 12:23
  • You don't. If your calculations include real numbers, you have lost precision anyway. The idea is to avoid floating point variables when you can, for example in one of my projects, music duration is expressed with a fraction, not a floating point number. Btw the comment was for the question but i misclicked. Your answer is correct. – Michael Chourdakis May 08 '19 at 12:26
  • @EricPostpischil - there are techniques for computing the "integer square root" (for a positive integral value `x`, this is the largest positive integral value `y` such that `y * y <= x`) using only integral operations - no floating point. Fractions (aka rational values) can be represented using a pair of integers, and operations on the pair. That said, I disagree with advice to not use floating point at all - like anything in software development, floating point has its uses. – Peter May 08 '19 at 12:43
  • 1
    @MichaelChourdakis: “You don’t” is not an acceptable solution for many algorithms. Any VR, AR, or physics modeling needs square roots and trigonometric functions. Much scientific and mathematical work needs those and other mathematical functions which are not sufficiently served with integer or rational arithmetic. Dismissing floating-point is not practical. – Eric Postpischil May 08 '19 at 13:30
  • @Eric: "How do you compute a square root or a cosine with integers or fractions?" Huh? That is possible. It is slow, but both my BigDecimal as well as my BigRational can do it. And how? The same (or similar) way it is done for FP, just not in hardware. – Rudy Velthuis May 08 '19 at 19:58
  • 1
    @RudyVelthuis: With integer arithmetic, what is the square root of two? How is that more precise than with floating-point? With rational arithmetic, what is the cosine of 4/5? – Eric Postpischil May 08 '19 at 20:14
  • @Eric: I didn't say plain integer arithmetic, I said BigRational or Bigdecimal. With precision set to 128, BigDecimal will produce: `1.4142135623730950488016887242096980785696718753769480731766797379`. Source code for BigDecimal.Sqrt: https://github.com/rvelthuis/DelphiBigNumbers/blob/master/Source/Velthuis.BigDecimals.pas#L2260, a simple Newton–Raphson algorithm. BigRational (not fully functional at the moment) uses a continued fraction algorithm. All these use (fractions of) integers. – Rudy Velthuis May 08 '19 at 20:48
  • 1
    @RudyVelthuis: Remember the context here— Michael Chourdakis’s comment to use integers or fractions. Using BigDecimal is floating-point arithmetic, regardless of whether it is implemented with integers. The issue is that integer or rational arithmetic is inadequate for many tasks for which floating-point is suitable. – Eric Postpischil May 08 '19 at 20:52
  • With integers: BigDecimal consists of a BigInteger and a scale. The BigInteger is scaled as necessary (by powers of 10), the integer square root is calculated and the result is scaled back (rounding where appropriate). – Rudy Velthuis May 08 '19 at 20:52
  • 1
    @RudyVelthuis: See that “scale” you write of in BigDecimal? That is the “floating” of floating-point. It is not integer arithmetic. It is floating-point arithmetic. – Eric Postpischil May 08 '19 at 20:53
  • @Eric: it is possible using (continued) fractions (which are fractions of -- usually small -- integers). Integers are used where appropriate. It isn't fast, though. – Rudy Velthuis May 08 '19 at 20:55
  • @Eric: basically, it is all integer arithmetic. But you also ruled out fractions (of integers). It is possible with them. – Rudy Velthuis May 08 '19 at 20:57
  • FWIW, BigDecimals could work with a fixed scale. But they would be pretty wasteful. The square root algorithm used is the integer square root algorithm as implemented in BigInteger. – Rudy Velthuis May 08 '19 at 20:58
  • 1
    @RudyVelthuis: **What** is possible? Remember the context—Michael Chourdakis’s comment to use integers or fractions, to be the “most precise.” But an approximation with continued fractions is not inherently more precise than floating-point. And if it is not faster or otherwise more efficient, what is the point? My questions about integer and rational arithmetic illustrate that there are certain mathematical limitations to **any** fixed-size arithmetic, and integers and fractions are neither suitable for various work nor the “most precise.” – Eric Postpischil May 08 '19 at 20:59
  • @Eric: Fractions can be far more precise than IEEE-754 FP. And continued fractions are very precise, actually, especially when continuously used as fractions, and not converted to FP. – Rudy Velthuis May 08 '19 at 21:03
  • @RudyVelthuis: The information content of 32 bits cannot exceed 32 bits. – Eric Postpischil May 08 '19 at 21:04
  • @Eric: duh! But multiple 32 bit integers can contain far more info. Who restricted this to 32 bits? – Rudy Velthuis May 08 '19 at 21:07
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/193057/discussion-between-eric-postpischil-and-rudy-velthuis). – Eric Postpischil May 08 '19 at 21:33
  • @Eric: I am going to bed. But thanks for the invitation. – Rudy Velthuis May 08 '19 at 21:41
  • So with all above discussion, I can conclude that there is no straight-forward way to solve this. We have to deal with such problems when working with float, double. Grrrrr!!!! – pkthapa May 09 '19 at 04:32
1

The output: same although both the float variables hold different values.

"float variables hold different values." is unfounded.

same was printed because values a,b are the same even if the initialization constants differ.


Typical float is 32-bits and can represent about 232 different values such as 1.0, 1024.0, 0.5, 0.125. These values are all of the form: +/- some_integer*2some_integer

1.012345679 and 1.012345678 are not in that float set. @Rudy Velthuis.

1.012345 67165374755859375 // `float` member
1.012345 678
1.012345 679
1.012345 790863037109375   // `float` member

Similar applies for double, yet with more precision - commonly 64 bits.

1.012345679 and 1.012345678 are not in that double set either

1.012345 67799999997106397131574340164661407470703125    // `double` member
1.012345 678
1.012345 6780000001931085762407747097313404083251953125  // `double` member
...
1.012345 6789999998317597373898024670779705047607421875  // `double` member
1.012345 679
1.012345 67900000005380434231483377516269683837890625    // `double` member

It can be thought of as 2 steps of rounding. Code 1.012345679 is rounded to the nearest double 1.01234567900000005380434231483377516269683837890625. Then the assignment rounds the double to the nearest float 1.01234567165374755859375.

float a = 1.012345679;
// 'a' has the value of 1.01234567165374755859375

Likewise for b. Code 1.012345678 is rounded to the nearest double 1.01234567799999997106397131574340164661407470703125. Then the assignment rounds the double to the nearest float 1.01234567165374755859375.

flaot b = 1.012345678;
// 'b' has the value of 1.01234567165374755859375

a and b have the same value.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
-3

It's because floats have 7 digit precision. If you want better precision you need to use double or long double.

Telenoobies
  • 938
  • 3
  • 16
  • 33
  • 1
    Just blindly throwing more bits at the problem is not a solution – Lightness Races in Orbit May 09 '19 at 01:07
  • Why should I use double when not required. I am able to get the execution using float too, but due to precision, I am struggling. The same will be with the double when storing bigger. Is there any straight-forward solution to this? – pkthapa May 09 '19 at 04:30
  • Your problem comes from the fact that you need higher precision types. It make sense to use a data type which uses more bits. Care to explain why you are reluctant to use a different type? – Telenoobies May 09 '19 at 14:53
  • "because floats have 7 digit precision" --> not always. Sample counter example: Both 8589973, 8589974 and covert to 8589974.0f – chux - Reinstate Monica May 11 '19 at 21:51