Floating point arithmetic and machine epsilon

Question

I'm trying to compute an approximation of the epsilon value for the float type (and I know it's already in the standard library).

The epsilon values on this machine are (printed with some approximation):

 FLT_EPSILON = 1.192093e-07
 DBL_EPSILON = 2.220446e-16
LDBL_EPSILON = 1.084202e-19

FLT_EVAL_METHOD is 2 so everything is done in long double precision, and float, double and long double are 32, 64 and 96 bit.

I tried to get an approximation of the value starting from 1 and dividing it by 2 until it becomes too small, doing all operation with the float type:

# include <stdio.h>

int main(void)
{
    float floatEps = 1;

    while (1 + floatEps / 2 != 1)
        floatEps /= 2;

    printf("float eps = %e\n", floatEps);
}

The output is not what I was looking for:

float epsilon = 1.084202e-19

Intermediate operations are done with the greatest precision (due to the value of FLT_EVAL_METHOD), so this result seems legit.

However, this:

// 2.0 is a double literal
while ((float) (1 + floatEps / 2.0) != 1)
    floatEps /= 2;

gives this output, which is the right one:

float epsilon = 1.192093e-07

but this one:

// no double literals
while ((float) (1 + floatEps / 2) != 1)
    floatEps /= 2;

leads again to a wrong result, as the first one:

float epsilon = 1.084202e-19

These last two versions should be equivalent on this platform, is this a compiler bug? If not, what's happening?

Code is compiled with:

gcc -O0 -std=c99 -pedantic file.c

The gcc version is pretty old, but I'm at university and I can't update it:

$ gcc -v
Using built-in specs.
Target: i486-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 4.4.5-8'
--with-bugurl=file:///usr/share/doc/gcc-4.4/README.Bugs
--enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.4
--enable-shared --enable-multiarch --enable-linker-build-id --with-system-zlib
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--with-gxx-include-dir=/usr/include/c++/4.4 --libdir=/usr/lib --enable-nls
--enable-clocale=gnu --enable-libstdcxx-debug --enable-objc-gc
--enable-targets=all --with-arch-32=i586 --with-tune=generic
--enable-checking=release --build=i486-linux-gnu --host=i486-linux-gnu
--target=i486-linux-gnu
Thread model: posix
gcc version 4.4.5 (Debian 4.4.5-8)

Current version of gcc, 4.7, behaves correctly on my home computer. There are also comments saying that different versions give different results.

After some answers and comments, that clarified what is behaving as expected and what's not, I changed the question a little to make it clearer.

What compiler are you using, and what switches are you using (target, optimization settings, et cetera)? Are you positive that code with `(float) (1 + floatEps / 2.)` and code with `(float) (1 + floatEps / 2)` and no other differences gives difference results? The two source files differ only in that single character, and exactly the same compiler switches were used? — Eric Postpischil, Apr 17 '13 at 16:08
@EricPostpischil well, an old one, I added details. Anyway, I tested the code again and I confirm the results. — effeffe, Apr 17 '13 at 16:17
Can't reproduce with gcc 4.6.2. The cast to `float` produces the correct value (`FLT_EVAL_METHOD` is 2, x86) for both versions. — Daniel Fischer, Apr 17 '13 at 18:34
@DanielFischer yes, it seems like newer versions behave correctly. — effeffe, Apr 17 '13 at 18:58

Bryan Olivier · Accepted Answer · 2013-04-18T15:27:30.403

7

The compiler is allowed to evaluate float expressions in any bigger precision it likes, so it looks like the first expression is evaluated in long double precision. In the second expression you enforce scaling the result down to float again.

In answer to some of your additional questions and the discussion below: you are basically looking for the smallest non-zero difference with 1 of some floating point type. Depending on the setting of FLT_EVAL_METHOD a compiler may decide to evaluate all floating point expressions in a higher precision than the types involved. On a Pentium traditionally the internal registers of the floating point unit are 80 bits and it is convenient to use that precision for all the smaller floating point types. So in the end your test depends on the precision of your compare !=. In the absence of an explicit cast the precision of this comparison is determined by your compiler not by your code. With the explicit cast you scale the comparison down to the type you desire.

As you confirmed your compiler has set FLT_EVAL_METHOD to 2 so it uses the highest precision for any floating point calculation.

As a conclusion to the discussion below we are confident to say that there is a bug relating to implementation of the FLT_EVAL_METHOD=2 case in gcc prior to version 4.5 and that is fixed from of at least version 4.6. If the integer constant 2 is used in the expression instead of the floating point constant 2.0, the cast to float is omitted in the generated assembly. It is also worth noticing that from of optimization level -O1 the right results are produced on these older compilers, but the generated assembly is quite different and contains only few floating point operations.

edited Apr 18 '13 at 15:27

answered Apr 17 '13 at 15:18

Bryan Olivier

5,207
2
16
18

Are you saying that the cast actually reduced the precision of the stored value? If so, why do I still get the wrong result if I cast the result *without* using any `double`? (I'll add this to my question) – effeffe Apr 17 '13 at 15:24
“The compiler is allowed to evaluate float expressions in any bigger precision it likes” This depends how the compiler defined the macro `FLT_EVAL_METHOD`. The compiler cannot do as it likes, it has to do exactly as indicated by how it defines this macro. – Pascal Cuoq Apr 17 '13 at 15:26
@PascalCuoq You are right, but if the value of `FLT_EVAL_METHOD` is `-1` it again can do pretty much what it likes. – Bryan Olivier Apr 17 '13 at 15:28
@effeffe: A cast constrains the value to the actual type, so `(float) (1 + floatEps / 2)` has the value of a `float`. That explains why your second code sequence printed the desired value. I do not know what you mean by “get the wrong result if I cast the result *without* using any `double`”. I do not see a code sample where you cast the result and did not use `double`. The first sample contains neither a cast nor a `double`. The second contains both a cast and a `double`. – Eric Postpischil Apr 17 '13 at 15:28
I also added the value of that constant to the question, it is `2`. – effeffe Apr 17 '13 at 15:40
Note that I should have written “A cast constraints the value to the named type” above, not the “actual type”. – Eric Postpischil Apr 17 '13 at 15:50
About your edit: if it always use the greatest precision, shouldn't the last code I added give the same result of the other one without the `double` literal? – effeffe Apr 17 '13 at 15:51
The type of the literal makes no difference as it will go the highest precision immediately. The result is fully determined by the final cast you put on the expression. And if there is no cast your compiler will use `long double`. – Bryan Olivier Apr 17 '13 at 15:56
Then why do I get different results with different types of literals? – effeffe Apr 17 '13 at 15:58
@BryanOlivier: Whether the compiler uses the highest precision is subject to the “whims” of the compiler. Slight changes in code may affect optimization choices. – Eric Postpischil Apr 17 '13 at 16:06
@effeffe that is a very good question, to which I have no answer. – Bryan Olivier Apr 17 '13 at 16:09
1

@effeffe cont'd: The `gcc` compiler at my home is version 4.3.2 (Debian distribution) and exhibits the same behavior as your compiler as does the compiler of `codepad.org`. The `gcc` compiler at my work is version 4.1.2 (Redhat distribution) and gives the correct result in both cases, but it also has `FLT_EVAL_METHOD` set to 0. compiler bug? – Bryan Olivier Apr 17 '13 at 17:11
I also suspect a compiler bug. Determining that with certainty might not be easy in a Stack Overflow context. – Eric Postpischil Apr 17 '13 at 17:21
I had a look at the assembly. And I can see a `fstps/flds` combination in the correct case that takes care of the truncation to `float`, which is missing in the incorrect case, for the rest the assembly of both cases is virtually identical. On `-O1`, `-O2` and `-O3` my compiler does give the correct result in both cases. – Bryan Olivier Apr 17 '13 at 18:15
I was downloading the old version right to see the produced assembly. You're talking about the 4.3.2, right? Thanks for your effort, really appreciated, +1. – effeffe Apr 17 '13 at 18:21
Yep, 4.3.2 is mine at home. I guess yours is producing the same assembly. Maybe it is fixed in the newest version of `gcc`, but I don't have access to it. – Bryan Olivier Apr 17 '13 at 18:25
I also had a look at the assembly and I can see the situation you describe, it just doesn't add those two instruction that truncates the result. Since the C99 standard requires this conversion and my `FLT_EVAL_METHOD` is `2`, could we definitely say that this is a bug and gcc is not conforming to the standard here? If so, please add these details to the answer and I'll accept it (sorry if the question changed a little from the beginning, that's because you and the other users actually made me understand what's behaving as expected and what's not). Damn gcc, that was such a simple assignment... – effeffe Apr 18 '13 at 14:49
@effeffe "If so, please add these details to the answer.." you got it. – Bryan Olivier Apr 18 '13 at 15:28

score 3 · Answer 2 · answered Apr 17 '13 at 15:44

3

A C99 C compiler can evaluate floating-point expressions as if they were of a more precise floating-point type than their actual type.

The macro FLT_EVAL_METHOD is set by the compiler to indicate the strategy:

-1 indeterminable;

0 evaluate all operations and constants just to the range and precision of the type;

1 evaluate operations and constants of type float and double to the range and precision of the double type, evaluate long double operations and constants to the range and precision of the long double type;

2 evaluate all operations and constants to the range and precision of the long double type.

For historical reasons, two common choices when targeting the x86 processors are 0 and 2.

File m.c is your first program. If I compile it, using my compiler, thus, I obtain:

$ gcc -std=c99 -mfpmath=387 m.c
$ ./a.out 
float eps = 1.084202e-19
$ gcc -std=c99  m.c
$ ./a.out 
float eps = 1.192093e-07

If I compile this other program below, the compiler sets the macro according to what it does:

#include <stdio.h>
#include <float.h>

int main(){
  printf("%d\n", FLT_EVAL_METHOD);
}

Results:

$ gcc -std=c99 -mfpmath=387 t.c
$ ./a.out 
2
$ gcc -std=c99 t.c
$ ./a.out 
0

answered Apr 17 '13 at 15:44

Pascal Cuoq

79,187
7
161
281

So, shouldn't the last two examples give the same result? If all constants and operations live in `long double`... – effeffe Apr 17 '13 at 15:55
@effeffe The cast in `(float) (1 + floatEps / 2)` constrains the result of this expression to be a `float`, even if the division and addition are done as `long double`. Joseph S. Myers gives the best interpretation of how modern GCCs are supposed to interpret this part of the standard here: http://gcc.gnu.org/ml/gcc-patches/2008-11/msg00105.html (and he quotes C99: “Except for assignment and cast (which remove all extra range and precision) […]”) – Pascal Cuoq Apr 17 '13 at 16:01
But with `FLT_EVAL_METHOD == 2` shouldn't `2` and `2.` have the same effect, since there is no assignment or cast operator before them? (nice link) – effeffe Apr 17 '13 at 16:09
@effeffe Yes, with `FLT_EVAL_METHOD == 2`, `floatEps / 2` and `floatEps / 2.0` should have the same effect. The former is typed as `float`, the latter as `double`, but both divisions should be computed in extended precision. What makes you say that something different happens? – Pascal Cuoq Apr 17 '13 at 17:05
Well, the fact that I get two different results in those two cases, see my updated question. – effeffe Apr 17 '13 at 17:48
The only `2.` literal in your question as of version 9 is in the test `((float) (1 + floatEps / 2.) != 1)`. This test works because of the `(float)`, not because of the `2.`. There is no other mention of a `2.` double literal. – Pascal Cuoq Apr 17 '13 at 18:49
What I'm saying is that `(float) (1 + floatEps / 2.) != 1` and `(float) (1 + floatEps / 2) != 1` leads to two different results. But with `FLT_EVAL_METHOD` evaluating to `2` all floating point constants and expressions should be evaluated with the precision provided by `long double`, so `2` and `2.` shouldn't leads to different results, right? The cast to `float` is in both the versions. – effeffe Apr 17 '13 at 18:57
@effeffe Oh, I see now. That is simply a GCC bug. With `~/gcc-172652/bin/gcc -std=c99 -mfpmath=387 m.c` I get a program that prints `1.192093e-07`. 172652 is the number of the latest SVN snapshot I have available, from April 2011. Older versions of GCC also give the result you say, `1.084202e-19`, for me. – Pascal Cuoq Apr 17 '13 at 19:47

Floating point arithmetic and machine epsilon

2 Answers2

Linked