issue in std::max() function comparision with fixed point implementation

Question

Is there any standard function available which can help me to compare the max() or min() between two float values ?

I have written the fixed point implementation for this min() and max() function from q0s32 to q32s0 type (33 types).

But I want to test the precision loss of my function with the std:min() and std::max() function .But results are not good from std functions .

I tried this way, but that did not work for me as result is not as per the expectation .

Code :

float num1 = 4.5000000054f;
float num2 = 4.5000000057f;

float resf = std::max(num1,num2);
printf("Result is :%20.15f\n",resf);
printf("num1 :%20.15f and num2 :%20.15f\n",num1,num2);

Output:

Result is :   4.500000000000000
num1 :   4.500000000000000 and num2 :   4.500000000000000

A float only has 6-7 digits of precision. Your 54 and 57 and falling out of what can be represented. If you want fixed point, you need to get or make a fixed point library, c++ `float` and `double` are floating point types. — NathanOliver, Jan 29 '20 at 15:53
This is a must read if you are going to step into the world of floating point math: https://stackoverflow.com/questions/588004/is-floating-point-math-broken — NathanOliver, Jan 29 '20 at 15:54
you use floats, which are per IEEE754 32 bit sized floating point numbers. your values are behind float precision — gkhaos, Jan 29 '20 at 15:54
Neither 4.5000000054 nor 4.5000000057 is representable by a single-precision IEEE 754 data type, which is likely your `float`. Both `num1` and `num2` actually represent 4.5, which is what you get. — Daniel Langr, Jan 29 '20 at 15:56
i suggest you to go one step back. Already in your example you see that `num1` and `num2` do not hold the value you expect (i suppose), so `std::max` isnt the issue here (there is no loss of precision due to `std::max`) — 463035818_is_not_an_ai, Jan 29 '20 at 16:04
You print `num1` and `num2` and they already "lost precision", how this related to `std::max()`. Just do `restf = num1` and you will get exactly the same result. — Slava, Jan 29 '20 at 16:08

score 2 · Accepted Answer · edited Jan 29 '20 at 16:26

2

Most implementations of c++ use the IEEE 754 standard for floating point arithmetic. Here is some useful information regarding this issue

In IEEE 754 float is a 32 bit single precision Floating Point Number (1 bit for the sign, 8 bits for the exponent, and 23* for the value), i.e. float has 7 decimal digits of precision.

In IEEE 754 double is a 64 bit double precision Floating Point Number (1 bit for the sign, 11 bits for the exponent, and 52* bits for the value), i.e. double has 15 decimal digits of precision.

You need to use double instead to get the desired results.

edited Jan 29 '20 at 16:26

François Andrieux

28,148
6
56
87

answered Jan 29 '20 at 15:58

M Hamza Razzaq

432
2
7
15

3

I don't think the C++ Standard defines a representation for floating-point types, see [basic.fundamental/12/4](http://eel.is/c++draft/basic.fundamental#12.sentence-4). – Daniel Langr Jan 29 '20 at 16:00
2

@DanielLangr -- that's formally correct, but in practice, it's IEEE 754 unless you're on exotic hardware. – Pete Becker Jan 29 '20 at 16:02
I agree that C++ Standard doesn't specify, but it's generally true in practice. Perhaps just an edit to the answer to clarify. According to the standard, a `float` and `double` could each be 8-bits. Practical implementations obviously don't do this. Not a fan of geeksforgeeks, perhaps a more reputable source like [cppreference](https://en.cppreference.com/w/cpp/language/types) – ChrisMM Jan 29 '20 at 16:06
1

Certain Arduino boards have [32 bit `double`s](https://www.arduino.cc/reference/en/language/variables/data-types/double/). It's a pretty common counter example. – François Andrieux Jan 29 '20 at 16:23
whether correct in most cases or not, the statement on the page you link "float is a 32 bit IEEE 754 single precision Floating Point Number" is simply wrong in the generality it claims – 463035818_is_not_an_ai Jan 29 '20 at 16:41

issue in std::max() function comparision with fixed point implementation

1 Answers1